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Materials and Methods 


Enrollment 

Upon enrollment in Darwin’s Ark (https://darwinsark.org), owners were asked to 
provide consent for participation and information about their dog’s approximate birth 
date, sex and spay/neuter status, suspected or known breed(s), purebred registration, 
and/or photograph. We assigned dogs to nine major regions of the contiguous United 
States based on participant zip code, and defined as urban, suburban, and rural by the 
U.S. Census Bureau Decennial Census of 2010 (www.census.gov) based on population 
density. We limited our analyses to data obtained until November 15th, 2019 (“data 
freeze” date). Dogs were prioritized for DNA sequencing by completeness of surveys, 
enrollment date, and distribution of survey responses. 


Survey collection 

This manuscript includes data from the initial 11 behavioral surveys added to the 
Darwin’s Ark project (ten questions each) (see Data and materials availability), and 
one survey about physical characteristics (eight questions), for a total of 118 questions 
(table S1) (the Darwin’s Ark site currently includes 22 surveys with 8-10 questions each). 
The surveys were offered to owners in a static order, two at a time, on the owner’s main 
account page (dubbed “My Laboratory”’), although participants could also opt to answer 
them in any order by selecting “View All”. Participants can opt to retake surveys, and 
both the original and retake responses are stored. For all analyses described here, only the 
original answer is used. 


When survey questions are offered to owners, the text of the question is 
automatically updated for their dog. The variable text is capitalized when questions are 
described in our data files. For example, the size question, “When DOG is standing next 
to someone of average height, how high are HIS shoulders?” would be modified to 
replace DOG with the dog’s name, and HIS with the appropriately gendered pronouns 
based on the owner’s report of the dog’s sex. We include the dog’s name in every survey 
question, and pronouns as needed. This personalization ensures that owners of multiple 
enrolled dogs answer the question for the correct dog. 


All 110 behavioral questions used a 5-point Likert scale: (1) 81 questions had 
options of strongly agree, agree, neither agree nor disagree, disagree, or strongly 
disagree; (2) 29 had options of never, rarely, sometimes, often, or always. Responses 
saved were codified as values {0,1,2,3,4}. 


We sourced 79 behavioral questions from published and validated canine behavioral 
and health surveys: (a) Dog Personality Questionnaire (DPQ / DPQL; 45 questions) (37); 
(b) Canine Health-related Quality of Life Survey (CHQLS; 11 questions) (36); (c) Dog 
Impulsivity Assessment Scale (DIAS; 18 questions, including one also in DPQ) (34); (d) 
Canine Cognitive Dysfunction Rating scale (CCDR; 6 questions) (35). 


We also included 31 new behavior questions and eight about breed-defining 
aesthetic traits (table S1). We developed the new behavior questions with input from 
animal behavior professionals associated with the International Association of Animal 


Behavior Consultants. Our goal was to identify behaviors that were both heritable and 
easy for owners to identify, and thus well suited for a community science behavioral 
genetics project. We first collated a list of 46 possible question topics based on the 
professionals’ initial suggestions, and then asked them score each topic on eight criteria, 
along a 5 pt scale: (1) incidence in pet dog population (rare to common); (2) how easily 
behavior is observed by owners (hard to easy); (3) whether the behavior was quantitative 
or binary; (4) whether environment was likely to have a major affect (likely to unlikely); 
(5) how malleable the behavior is (prone to resistant); (6) whether the owner was likely 
assign a value judgment to the behavior, and thus to try and train or untrain it (Biased to 
Neutral 1-5); (7) whether the prevalence of behavior tended to differ between breeds, 
suggesting heritability (low to high); (8) whether spay/neuter status would have a strong 
effect (strong to weak). From this, we identified 31 questions that scored at least 
moderately highly on all criteria, and were not included in existing surveys. Our 
subsequent heritability analysis showed a disproportionately high number of the new 
questions in the top quantile of heritability (12 out of 31, or 39%; p= 0.04 on 1-sample 
proportions test without continuity correction). 


We also included a survey with eight questions about physical characteristics, and 
these had more variable options, including one allowed for more than one answer to be 
selected (Q#122, "What color is DOG? Select all that apply.", with eight options) (table 
S1). Ticking coat pattern phenotypes were validated in dogs with photos through manual 
examination. Another four asked owners to select an answer based on an accompanying 
graphic showing a range of responses (fig. S3). When physical trait questions were 
analyzed as quantitative traits in subsequent analyses, answers of “I'm not sure’, “I don't 
know”, “Not sure” and, for question #125, “Surgically cropped ears”, were set to missing 


(NA). 


Phenotypes derived from survey data 

Phenotypes were primarily defined directly as behavioral factor scores, responses to 
behavioral questions, and responses to physical trait questions. For quantitative analyses, 
scores were normalized by calculating the standard score; normalized scores are included 
in the shared survey data files (see Data and materials availability). 


Birth dates and age 

All responses to survey questions were timestamped in POSIX time format. 
Prior to July 3rd, 2018, age and birth dates for enrollment were collected as free response 
entries. In order to standardize these as birth dates in international format (YY Y Y-MM- 
DD) for estimation of age, the following steps were executed using a combination of 
functions from the R packages ‘data.table’, ‘stringr’, ‘anytime’, and ‘lubridate *: 

1. For dogs with a parsable birth date, directly convert to YYYY-MM-DD. 

2. For dogs with a year and month, assign birth date YYYY-MM-01. 

3. For dogs with a year only, assign birth date YYYY-01-01. 

4. For dogs with no parsable birth date but age given in years and/or months, parse 

into duration and subtract from date of earliest survey to estimate birth date. 
5. After each of the above steps: 
a. Set all birth dates before January lst, 1980 to NA 


b. Set all birth dates postdating survey response dates to NA 
6. The remainder with age or birth date entries were parsed by hand, if an 
interpretable free response was given by the owner. 
7. Otherwise, the birth date was rendered NA. 


Sex and sterilization status 

Half of dogs in our cohort (50.6%) were female, and 89.8% were spayed or 
neutered, higher than the ~70% reported in the most recent American Veterinary Medical 
Association demographic sourcebook (/8). Sterilization rates for mutts in our data were 
1.26x higher than purebred dogs, close to the 1.19x reported elsewhere (/08). 
Sterilization status did not substantially change the effect of sex estimated for survey 
responses. The analysis of variance effect (ANOVA ges) of sex on behavioral survey 
reponses remained constant across intact and sterilized dogs (Rpearson=0.997, p=4.04x 10" 
142 N(intact)= 1,049, Nistenbzed 13,278). In the final ANOVA analysis, we used four 
discrete sex categories: female (intact); female (sterilized); male (intact): male 
(sterilized). 


Validation of size phenotypes 

We validated the owner's responses to question #121, “When DOG is standing next 
to someone of average height, how high are HIS shoulders?” (see Survey collection) 
using both individual size measurements (done by both owner and non-owner), and using 
breed-average heights (data S1). Validation set #1: 337 dogs in which owners were 
provided with a measuring tape by mail and instructed to measure the height from their 
dog’s shoulder to the ground (fig. S2B,C). Validation set #2: 38 dogs recruited during 
the 2017 Somerville Dog Festival in Somerville, MA. Owners were asked to complete 
the Darwin’s Ark surveys, and responses were compared to the height to withers 
measurement done at the event by Darwin’s Ark staff members (fig. S2D). Validation 
set #3: For 2,025 purebred dogs from breeds with average heights given in (/09), we 
compared owner reported size to the average heights for their breed (fig. S2E). Pearson's 
correlation was calculated for each validation set using ‘cor.test()’ in R package ‘stats 
(version 4.1.1). 


Validation of behavioral questions 

We validated a subset of behavioral questions derived from published questionnaires 
performed as expected by comparing question-question correlations reported by the The 
Dog Personality Questionnaire (DPQ) (37) to our survey data. The original DPQ reported 
question-question correlations across 2,556 dogs (Study 3). The DPQ used a 7-point 
Likert scale keyed from “Strongly Disagree” to “Strongly Agree”, though a subset of 
questions had responses reverse-keyed. The Darwin’s Ark implementation of 48 DPQ 
items used a 5-point Likert scale keyed from “Strongly Agree” to “Strongly Disagree” 
(reverse direction, no re-keying of responses). We matched question numbers from the 
original DPQ (Appendix G) to question numbers from Darwin’s Ark surveys and 
extracted the matrix of question-question correlations from the original DPQ (from 
Appendices E and F). We reversed original DPQ correlations (7 = x-1) for question pairs 
for which one question was indicated as re-keyed. For the same 48 questions sourced 
from the DPQ in Darwin’s Ark across 10,253 dogs, we obtained the question-question 


correlation matrix after normalization. We converted correlations to correlation distances 
(d = 1 - |r|) and performed a Mantel’s test between the original DPQ matrix and the 
Darwin’s Ark DPQ matrix of correlation distances using R package ‘ade4 with 100,000 
replicates and found a Mantel’s R of 0.9466711 (p = 1 x 10°’). We were not able to doa 
similar analysis of the other questionnaires used to source questions, as the published data 
did not include question-question correlations. 


We also validated the subset of the questions from two other studies shown to 
correlate with age. For all questions, we also see an age correlation in Darwin’s Ark. Two 
questionnaires included age-related questions: six questions sourced or paraphrased from 
questions in the Canine Cognitive Dysfunction Rating (CCDR) scale (//0) (table S11) 
and 11 questions from the Canine Health-related Quality of Life Survey (CHQLS) (36) 
(table S12). In total, 14 questions were assessed (three questions overlap between the 
surveys). The PPS scores for all 14 questions are significantly correlated with age, with 
an average Pearson correlation of 0.889 (SD:0.121; range:0.630-0.99), compared to 0.661 
(SD:0.308; range=0.0375-0.994) for the other 96 behavioral questions. The age 
correlations for these 14 questions are significantly higher than for other behavioral 
questions (one-sided t-test; t=5.06; p=4.0x10°; df= 43.7). The direction and magnitude 
of the change matched the previously published results. 


Exploratory factor analysis 

We performed exploratory factor analysis using the R package ‘nFactors* (1/1) on 
the behavioral survey questions using 10,253 dogs with answers for all 110 behavioral 
questions. All survey responses were first normalized to a 0-6 range using the R function 
normalize (method="standardize’’). The number of factors to extract was estimated from 
several heuristic methods, namely Horn’s Parallel Analysis (n = 20), Optimal 
Coordinates (n = 20), and Acceleration Factor (n = 2), and the optimal factor number of 
20 was selected. After excluding questions with low pattern or structure loadings (< 
+0.3), 19 factors were generated. A varimax orthogonal rotation was applied to generate a 
structure matrix with factor loadings for each question. The first 8 factors explained a 
cumulative 24.26% of variance (table $3), and were selected for additional analysis (fig. 
S4A,B). The remaining factors were discarded. An additional 6,269 dogs with less than 
20% missing data had factor scores generated by filling missing data with randomly 
sampled values. The age of each dog for each factor was calculated as the mean age at 
survey response to questions included in that factor. Factor scores for 16,077 dogs 
enrolled before the data freeze date were retained. 


We assigned names, adjectives to describe low and high scores, and short 
descriptions based on the questions captured by each factor. Factor 1 (“Human 
Sociability”), which captured questions about social interactions with people, explained 
3.60% of variance. Factor 2 (“Arousal Level”) included questions about a dog’s reaction 
to excitement and explained another 4.06% of variance. Factor 3 (“Toy-directed Motor 
Patterns”) included questions about engaging with toys and objects, which may represent 
underlying differences in canine motor patterns. Factor 4 (“Biddability”) represented ease 
of training and amenability to trained behaviors. Factor 5 (“Agonistic Threshold’’) 
describes the conditions and contexts in which a dog may apply agonistic behavior. 


Factor 6 (“Dog Sociability”) highlighted dog-directed social interactions. Factor 7 
(“Environmental Engagement’’) covers questions about a dog’s responsiveness to their 
surroundings. Factor 8 (“Proximity Seeking”) describes behaviors aimed at human 
contact and proximity. See table S4 for a breakdown of these factors. 


Sample collection 

We sent owners saliva collection kits (DNA Genotek PG-100 saliva swabs) to 
sample their dogs, and received and stored a total of 6,909 saliva samples. We 
preferentially selected dogs for sequencing based on survey completeness and enrollment 
date, as well as including dogs from several underrepresented breeds to expand the breed 
calling panel to include the 100 most common breeds in the U.S. Of 1,715 samples 
submitted for low-coverage DNA sequencing, 159 samples (7.4% of 2,155 dogs included 
in the genetic data set) had sequencing funded by owner donations to the Darwin’s Ark 
Foundation, a 501(c)(3) non-profit organization (82-3942341). 


Reference genome assembly 

For all genetic analyses, we use the CanFam3.1 reference genome assembly (NCBI 
accession GCF _000000145.2). Chromosome Y was excluded because of the lack of a 
high quality assembly. For imputed data, chromosome X was excluded because of the 
potential for its smaller population size (~75% that of autosomes) to impact both 
imputation and selection analyses. 


Whole genome sequencing and joint variant calling 

We performed high-coverage whole genome sequencing (WGS) at an average 
45.82x (SD: 9.76x) of 27 putatively mixed-breed dogs and completed the first genetic 
analysis of this population, dubbed the Mendel’s Mutts cohort (data $2). We performed 
joint variant calling on 676 whole genomes of dogs, wolves, and other canids, including 
22 of the 27 mutts, from publicly available sequencing data as well as sequencing data we 
released under BioProject PRJNA683923. All steps of variant calling were performed 
using the Genome Analysis Toolkit (GATK3, nightly version from June 24th, 2016). 
First, GATK Base Quality Score Recalibration (BQSR) ran with standard arguments and 
specified, known canid variants comprising 19,112,082 distinct single-nucleotide 
polymorphic sites compiled from the Dog Genome SNP Database (DoGSD), Broad 
Institute (BICF), and Axiom. Second, gVCFs were generated using GATK 
HaplotypeCaller. Two genotype quality bands were set with upper bounds of 20 and 100 
(-GQB 20 -GQB 100). Paths with fewer than 2 supporting kmers were pruned (- 
minPruning 2). Since we were unsure if any PCR free samples were included, we set the 
PCR indel model to none (-pcrModel NONE). GATK GenotypeGVCFs performed joint 
variant calling of all samples together to produce a single VCF. The SNPs and indels 
were filtered separately and subsequently recombined. The SNPs were filtered using the 
following parameters: "QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || 
ReadPosRankSum < -8.0". Indel filtering used the following parameters: "QD < 2.0 || FS 
> 200.0 || InbreedingCoeff < -0.8 || SOR > 10.0 || ReadPosRankSum < -20.0". After joint 
calling, we removed samples that did not meet a minimum coverage of 10x and 0.8 
calling rate. We split multiallelic sites into biallelic records and normalized and left- 


aligned insertions / deletions using the BCFtoo/s norm function with the CanFam3.1 
reference FASTA. We then performed haplotype phasing with Beagle (v5). 


The final VCF, referred to as the Broad-UMass Canid Variants (available at 
https://data.broadinstitute.org/DogData/), contained 46,134,885 anchored variant records, 
with 34,191,821 SNPs and 11,943,064 insertions / deletions for 540 dogs of known breed 
ancestry distributed among ~133 breeds, 28 dogs of mixed breed ancestry (22 of which 
come from the deeply sequenced Mendel’s Mutts cohort described below), 12 dogs of 
unknown ancestry, 62 worldwide indigenous or village dogs, 33 wolves, and 1 coyote. 
See data S4 for samples included in the variant call file. The other 5 mutts in the 
Mendel’s Mutts cohort underwent calling for the same set of variants in the Broad- 
UMass VCF using HaplotypeCaller from GATK3 in genotyping mode 
“GENOTYPE GIVEN ALLELES’. 


Unique and shared genetic variants 

We selected 22,139,829 biallelic SNPs from autosomes and chrX with allele count 
>1 across all 27 mutts and 530 breed whole genomes representing 128 breeds; of these, 
375,474 (1.7%) were unique to mutts and not observed in any sampled breeds; 
11,599,379 (52.4%) were shared between mutts and breeds; and 4,577,549 (20.7%) were 
unique to individual breeds, among which each breed had, on average, 35,762 + 68,719 
unique SNPs. Another 5,587,427 (25.2%) SNPs were shared across 1+ breeds and 
unobserved in the mutts. To assess the frequencies of population-specific SNPs, we 
randomly sampled n=10 dogs from mutts and each of 13 breeds. 


Cumulative variant discovery 

We compared the rate of variant discovery by whole genome sequencing of 
individual purebred versus mutt dogs, using a randomly chosen chromosome (chr13) as a 
proxy for the whole genome. From among the 557 total dogs for which 30x WGS data 
were available, we considered three cohorts: one random dog per breed (N=128 possible 
dogs), mutts from Mendel's Mutts (V=27 dogs), and, separately, dogs of each of the four 
breeds for which WGS data were available from >27 individuals: golden retriever 
(N=36), Yorkshire terrier (V=56), Labrador retriever (N=31), and Leonberger (N=38). 
We computed the cumulative distribution of the fraction of the 619,031 chr13 variants 
discovered using all 557 dogs that were discovered using from one to ten dogs randomly 
chosen and ordered from within each cohort. We computed 95% CI from the distribution 
of values from random reorderings within each cohort. 


Linkage disequilibrium 

Analysis of linkage disequilibrium was done with PLINK v1.90b6.9. For each 
population in the Broad-UMass Canid Variants VCF, 25 dogs were chosen at random 
from all dogs in the population. We tested seven populations: (1) mixed breed dogs 
(Mendel’s Mutts); (2) four breeds (golden retriever, Labrador retriever, Leonberger, and 
Yorkshire terrier); (3) village dogs; (4) wolves. We included mixed breed dogs and 
golden retrievers with WGS coverage > 30x; for other populations, coverage was >15x. 
We first filtered for autosomal, bi-allelic variants, and then selected 20,000 random 
SNPs. For each population, we assessed the extent of LD by measuring r? between each 


random variant and all variants within 100kb. We assessed tagging of variant sets by 
measuring r? between each random variant and variant sites included on genotyping 
arrays or in the low-pass sequencing GWAS dataset (171,882 for the Illumina HD, 
1,011,992 for the Axion, and 10,355,966 for low pass sequencing). Within each 
population, we analyzed all SNPs with MAF > 0.025 (table S13). 


Runs of homozygosity in mutts, breeds, and village dogs 

We compared the lengths of detected runs of homozygosity in mutts, dog breeds, 
and village dogs using whole genome sequencing data for dogs in the Broad-UMass 
Canid Variants set (58,308,734 biallelic SNPs) and Mendel’s Mutts cohort (56,797,766 
biallelic SNPs). The settings used to detect ROH in the WGS data were as follows: 
minimum run length of 100kb (--homozyg-kb 100) and minimum SNP count of 100 
SNPs (--homozyg-snp 100), at a density of 1kb per SNP (--homozyg-density 1), with no 
two SNPs more than 500kb apart (--homozyg-gap 500), and no heterozygous genotypes 
tolerated (--homozyg-window-het 0) (//2). The remaining options given by the default 
settings implemented in PLINK v1.90b6.21 (19 Oct 2020). We then randomly sampled 
n= 464 runs, the mean number of ROH detected in each mutt, from the pool of ROH 
detected in mutts, purebred dogs, and village dogs, resampling N= 100 times. 


Low-pass sequencing with imputation 

1,715 dogs enrolled in Darwin’s Ark were sequenced at coverages of 0.5x to 1.1x 
depth on the Gencove sequencing platform. Sequencing reads were processed into 
imputed autosomal variant calls through Gencove’s /oimpute software (46) using the 
copying model described in Li and Stephens, 2003 (//3) , and an imputation reference 
panel containing publicly available whole genome sequence data (mean coverage 22.9x 
(SD: 14.2x)) for 435 canids and representing 287 dogs of known pure breed ancestry, 6 
dogs of unknown ancestry, 100 worldwide indigenous or village dogs, 36 wolves, and 6 
other wild canids, VCF provided by Elaine Ostrander of the National Human Genome 
Research Institute (all accessions in data S4). This generated 46,349,043 unfiltered call 
site records -- 32,438,672 SNPs and 13,910,371 indels -- mapping at a density 19.8 (SD: 
6.9) variants of per kb. We used the impute genotype probability (GP) scores per 
genotype per call site per dog for filtering purposes. Filtered at GP >70%, each dog 
averaged 32,213,747 (SD: 141,060) SNPs and 13,603,537 (SD: 63,729) indels, including 
98% of common variants (minor allele frequency > 1% in dogs with high-coverage WGS 
data). This data was merged and filtered according to the procedures listed below. See 
data S4 for samples in the imputation reference panel. 


Genotyping arrays 

440 dogs underwent genotyping on the Axiom Canine Genotyping Array Set A & B 
for 1,268,920 variant call sites (1,267,416 SNPs and 1,504 indels) before we adopted the 
low pass sequencing approach described above. Variant call data for these samples were 
subsequently imputed with /oimpute against the same panel described above, also 
resulting in 46,349,043 unfiltered call sites with GP scores. Filtered at GP >70%, each 
dog averaged 32,006,290 (SD: 157,307) SNP and 13,497,132 (SD: 50,619) indels. This 
data was merged and filtered according to the procedures listed below. 


Imputation performance and genotype concordance 

We used two approaches to compare the performance of low-pass sequencing plus 
imputation to that of genotyping arrays in admixed dogs: (1) by running low coverage 
(1.0x + 0.6x) re-sequencing and genotyping arrays on mutts with WGS data and (2) by 
down-sampling WGS data. For the truth set of genotypes, we selected SNPs at MAF 
>2% in 676 dogs and observed at least once in the 27 dogs from the Mendel’s Mutts 
cohort. The sample genotype concordance between unimputed and imputed genotypes 
from genotyping arrays (9 dogs, 10 arrays) or imputed genotypes from low-pass 
sequencing (11 dogs, 14 sequencing runs) or downsampling (5 dogs) and whole genome 
sequencing data was evaluated using the BCFrtools stats function. 


Low-pass sequencing plus imputation provided a higher density of variant calls 
(19.8 + 6.9 variants per kb, ~40x denser than the Axiom array). We found concordance 
between low-pass and WGS of 98.3 + 0.7% (n=14 runs; ~7.7 million SNPs of MAF >2% 
across 676 dogs), similar but slightly lower than the Axiom array (99.3% + 0.1%; n=10 
arrays; 0.83 million SNPs), but better than imputed array calls (97.3 + 0.3%; 7.6 million 
SNPs), with a lower discordance for imputing heterozygous genotypes (1.09% vs. 
1.66%). 


Merging and filtering 

Prior to merging, samples processed by low-pass sequencing or genotyping arrays 
with imputation were filtered to remove genotypes with genotype probability below 70% 
(BCFtools ‘-e 'MAX(GP[*])<0.7"). Subsequently, VCFs were merged (BCFtools) and 
converted to a PLINK data set. SNPs below a minor allele frequency of 2% and missing 
in over 20% of individuals were filtered out (PLINK *--maf 0.02 --geno 0.20"). Only 
biallelic SNPs with extreme deviation from Hardy-Weinberg equilibrium, given p-values 
below 1x10°° in the exact test with mid-p adjustment and at observed/expected 
heterozygosity ratios under 0.25 or above 1.0, were excluded. After filtering data, 
8,518,951 SNPs and 2,155 dogs remained with a total genotyping rate of 97.5%. Owner- 
reported sexes were encoded in the sample information file, confirmed by relative X- 
chromosome coverage for sequencing data (SAMtools ‘idxstats’) and the autosomal 
genotypes of X for genotyping data; in total, 1,084 males and 1,071 females. Six dogs 
flagged as unusual in their ratio of X chromosome coverage to autosomal coverage, and 
two of which came from dogs of the same household but could not be confirmed as 
sample mix ups. Variant IDs were assigned to include chromosome, position, reference 
allele, and alternate allele. 


Homozygosity and inbreeding 

We scanned for runs of homozygosity in the genetic data for the Darwin’s Ark 
cohort. For the Darwin’s Ark cohort, we scanned 32,742,462 biallelic SNP genotypes 
with genotype probability >70% using PLINK (v.1.9) with the following settings: 
minimum run length of 500kb (--homozyg-kb 500) and minimum SNP count of 100 
SNPs (--homozyg-snp 100), at a density of 1kb per SNP (--homozyg-density 1), with no 
two SNPs more than 500kb apart (--homozyg-gap 500), and only | heterozygous 
genotypes tolerated per window (--homozyg-window-het 0) (//2), performing scans 
without LD-based pruning (//4). We then calculated the autosomal ROH-estimated 
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coefficient of inbreeding (Fron) from the total ROH segment length divided by the total 
SNP-covered length across autosomes where ROH detection was possible (/15, 1/6). 


Kinship and relatedness 

We measured kinship & between pairs of individuals using the KING-robust kinship 
estimator (//7) on the GWAS cohort (N= 2,097 with any phenotype data; 2,197,656 
pairs). The majority (99%) of pairs were unrelated (k<0), and the average kinship score 
was -0.232 +SD:0.298 (fig. S32). Only 28 pairs of dogs (0.0017%; 46 unique dogs) 
exceeded kinship of A=0.2 (just under half-siblings). 


Breed name standardization 

Owner-reported breed names were standardized to unify those referring to the same 
breed and to correct misspellings. In addition to analysis as separate breed populations, 
breeds were assigned higher level groupings to check whether ignoring certain features 
that may not necessarily distinguish dogs into separate breeds (e.g., ‘wirehaired 
dachshund’ and ‘longhaired dachshund’; various sizes of poodle) impacted the 
relationship between breed and survey data. 


Breed standards, stereotypes, and popularity 

The breed standard values for six physical traits (body size, ear shape, number of 
coat colors, white spotting, coat length, and coat type) were obtained from the American 
Kennel Club (53) or, for the non-AKC breeds, breed-specific clubs (see Data and 
materials availability). We collected three-word stereotypes from the American Kennel 
Club breed resources at www.akc.org/dog-breeds (52) and assigned breeds to breed 
groups based on information from the AKC official breed standards (53). The following 
sets of descriptors were treated as the same stereotypes: (A) {bright, clever, intelligent, 
smart, very smart}, (2) {devoted, loyal}, and (3) {energetic, exuberant, lively, peppy}. 
The breed group “non-sporting” was excluded because it potentially encompasses a broad 
spectrum of working functions (it is defined as “A diverse group of multifunctional dogs 
not generally regarded to be game hunters.” (53)). We collated scores for each breed on 
ten behavioral characteristics from the “At a Glance” tables of the Encyclopedia of Dog 
Breeds (19). We used the mean number of American Kennel Club (AKC) registrations 
reported between 2000 and 2015 as a proxy for breed popularity in the United States (39). 


Breed reference panel 

We selected 871 purebred dogs representing 89 breeds with published genotypes for 
185,805 SNPs from the [lumina CanineHD array (//8). Owners submitted data for 12 
purebred English shepherds with 214,220 SNPs genotyped on a custom high-density 
Illumina marker platform. Illumina CanineHD array data for 17 Golden Retrievers and 11 
Leonbergers, for which we also have 30x whole genome sequencing data, were imputed 
to test the genotype concordance of imputed Illumina genotyping calls. We subsequently 
imputed genotypes for 32,438,672 SNPs from these arrays. An additional 109 dogs 
representing 43 breeds were genotyped at 1,011,254 SNPs on the Axiom Canine 
Genotyping Array Set A & B and imputed up to 32,438,672 SNPs. Among dogs enrolled 
in Darwin’s Ark, 115 registered purebred dog samples from 54 breeds had low-pass 
sequencing data considered for inclusion. A total of 380 purebred dog samples spanning 
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74 breeds with whole genome sequencing calls were considered for inclusion in the breed 
reference panel. Among these, 194 overlapped with samples in the imputation panel. 


We performed nearest neighbor checks using PLINK and for most breeds, found that 
the 3 genetic nearest neighbors for each dog were of the same breed. We identified 14 
dogs with nearest neighbors from an additional breed, including 8 American Pit Bull 
Terriers nearby Staffordshire Bull Terriers. Three samples of Greyhound were discovered 
to be duplicates. 


For each of these 101 breeds, 12 dogs were selected to be included in the breed 
reference panel, prioritized by raw data density and genetic diversity within the breed. 


We retained SNPs which were genotyped in over 80% of dogs and at a minor allele 
frequency of at least 5% across dogs. These 7,065,140 SNPs shared 2,528,037 positions 
with the recombination-based genetic map measured in free-living village dogs (//9). For 
the remaining unshared positions, centimorgan map distance values were linearly 
interpolated using R ‘approx()’. In order to select dense ancestry-informative markers, 
SNPs within 5 kb of index SNPs above Fsr= 0.15 between reference breeds and with r? > 
0.9 were clumped (as PLINK --clump considers values below threshold for indexing, 1- 
Fst was supplied to the clumping function). The final set of local breed ancestry markers 
contained 2,468,442 SNPs (on average, spanning 110 SNPs per 100 kb) and 1,212 dogs 
(101 breeds with 12 dogs per breed) with a total genotyping rate of 99.44%, which we 
used to perform admixture simulations. Similarly, sparser markers for global ancestry 
inference were obtained by clumping with 50 kb, Fsr> 0.15, and r? > 0.5, yielding 
688,060 global ancestry markers. 


Admixture simulations 

We used a Monte Carlo approach (GitHub repo: lindaboettger/ancestry_assignment) 
to generate simulated admixed individuals with known ancestry per haplotype, and then 
compared the known breed ancestry composition with global ancestry inferred via 
ADMIXTURE (120). To simulate admixed individuals, we performed N = 15 generations 
of admixtures according to the following procedure. N+1 random individuals from 
different breeds were selected to contribute to the admixture. With each iteration, 
recombination was simulated to incorporate a new individual. Recombination events 
were treated as a Poisson event occurring on average once every Morgan. The above 
simulation was run on 10 independently drawn data sets of 6 dogs per reference breed to 
create 1,000 admixed individuals of known ancestry. Genotype data from each set of 
simulated individuals was merged with the remaining 6 dogs per reference breed and 
filtered for SNPs in the global breed ancestry panel. We inferred global breed ancestry 
for simulated individuals using the supervised mode of ADMIXTURE (random seed = 43) 
supplied with reference population assignments. We correctly call breeds comprising 
>5% of simulated ancestry >90% of the time, and inferred ancestry correlates with true 
ancestry out to 12 generations of admixture (Rpearson=0.94; p=2.2x10°!°). 


Global ancestry inference 
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We performed unsupervised admixture analysis for 1,073,779 LD-pruned (50 kb and 
1’ > 0.5) biallelic SNPs genotyped in the Darwin’s Ark genetic data set, using 23,613 
SNPs on chromosome 38 for 5-fold cross-validation of K= {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 10, 
15, 20, 25, 50, 75, 100, 150, 200} clusters to identify K between 50 and 100 as the 
optimal number of clusters for the Darwin’s Ark genetic cohort (fig. S6). For the 
supervised admixture analysis, genotype data from all query dogs was merged with all 
reference breed data and filtered for SNPs in the global breed ancestry panel. Global 
ancestry from the 101 reference breeds were inferred using the supervised mode of 
ADMIXTURE (random seed = 43) supplied with reference population assignments. 
Population weights under 1% were discarded from individual ancestries. 


Breed set assignments 

We considered two scenarios in which a dog might present as purebred: (i) the dog 
was assigned a single owner-reported breed and (11) the dog was assigned a single owner- 
reported breed and registered purebred status. Other dogs with no or two breeds reported 
were presumed to be non-purebred. For breeds included in the breed reference panel, we 
determined the rate at which owners correctly identified the predominant breed ancestry 
of their dog and the relationship between owner-reported breed and percent top breed 
(table S6). We assessed accuracy for the 880 dogs with only one reported breed, as well 
as for a subset of 304 dogs described as registered purebred and the remaining 1,186 
presumably non-purebred dogs. Global ancestry inference assigned at least 85% of 
ancestry to a single breed in most (89.7%) registered purebred dogs, and therefore any 
dog with 85% or more breed ancestry was considered confirmed as purebreds by 
sequencing. Likewise, dogs with under 45% breed ancestry not matching their owner’s 
report and not reported as a breed we cannot detect were considered mutts. 


We defined three breed sets: (1) confirmed purebred dogs were either described as 
registered purebred by the owner, or confirmed by sequencing to have 85% or more 
ancestry from a single breed (3,637 dogs); (2) candidate purebred dogs included all 
confirmed purebred dogs and dogs with owner-reported ancestry from one breed not in 
disagreement with genetic breed ancestry, if available (9,009 dogs); (3) mutts were all 
other dogs including any dogs with <45% genetic breed ancestry in discordance with 
owner reported ancestry (9,367 dogs). Among 217 dogs with disagreement between 
owner reported breed and genetically inferred ancestry, 89 were reported as breeds absent 
from our reference panel and 90 were re-classified as mutts. Extrapolating from our 
analysis of dogs with genetic data, we expect that 89.7% of registered purebred and 
58.2% of dogs with owner-reported ancestry from one breed would, if sequenced, have 
>85% ancestry called from their owner-defined breed. This is higher than the 3% 
expected by random chance from the mean probability of an owner guessing the breed 
correctly, for all dogs with breed confirmed by sequencing, given the assumption that 
owners guess breeds at rate corresponding to breed population frequency. 


SNP-based heritability estimates 

We used the Genome-wide Complex Trait Analysis (GCTA, version 1.92.3 beta 3) 
(56) software tool to calculate LD scores in 250kb regions using a block size of 10,000kb 
with an overlap of 5,000kb between blocks. In GCTA, we generated a genetic 
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relationship matrix (GRM) for the data set of 2,155 dogs, as well as multiple GRMs 
calculated from SNPs stratified into LD score quartiles (/2/). We used restricted 
maximum likelihood (REML) analysis for all factor and question scores and 14 physical 
traits using the GRM for all SNPs to obtain estimates of h’snp for the sample set of all 
dogs. We calculated the LD-stratified estimates of h*snp for the same traits except when 
more than half of the variance components were constrained. 


We also computed the first 10 principal components from the GRM of all SNPs 
using GCTA and performed heritability analyses in unrelated dogs (no kinship > 0.2) 
using the LD-stratified GRM(s) plus the first 10 PC eigenvectors as quantitative 
covariates. Heritabilities estimated in this manner had nearly perfect positive correlation 
with those estimated without PC covariates (Rpearson=0.974; p=1.1 x 10°8) and an average 
fold change of 0.08 + SD:0.43 with nearly perfect overlap of confidence intervals except 
for Q64 Circles before pooping. Additionally, we performed heritability analysis in 
unrelated, highly admixed mutts having no breed ancestry detected over 45%. 
Unsurprisingly (given the much smaller sample size) these correlated less tightly with 
estimates on the full set of dogs (Rpecarson= 0.44, p=0.0004), with an average fold change of 
0.53 + SD:1.24, but the confidence intervals for most traits (92%) overlapped. 


Analysis of variance 

To assess how much of the variance in traits is explained by age, sex, size and breed, 
we ran an analysis of variance (ANOVA) 1|-tailed analysis with all dogs > 1 year of age 
with age, sex, size, and breed information, using the anova_test function in R package 
‘rstatix’ 0.7.0, in R version 4.1.1. We tested both the candidate purebred and confirmed 
purebred classifications and included all breeds with >5 dogs. For the candidate dataset, 
we had 121 breeds and, on average, 5968+332 dogs per question/factor, and for the 
confirmed dataset, 78 breeds and 2251+119 dogs per question/factor. We used 2 different 
models to test how much of the variance in normalized factor and question scores was 
explained by age, sex, size and breed. For question 121 (the size question), which is the 
source of the size phenotypes, we used a model that excluded size: [value ~ breed + sex + 
age]. For all other questions we used [value ~ breed + sex + age + size]. We corrected for 
multiple testing using Benjamini-Hochberg procedure. To assess whether fur length 
explains any variance in Q21 (focus in distracting situations), we used the same 
procedure with the model [value ~ fur length + breed]. 


Predictive value of breed, age, and size 

We calculated the improvement in prediction of behavioral traits afforded 
by information on the breed, age, and size of a given dog. To do so, we used a relative 
risk framework, asking, for example, by what fold information on any of these features 
could aid a prospective dog owner in identifying an individual dog with specified 
individual or combinations of behavioral traits, relative to a dog selected at random. 


Interactive dashboard 

We developed an interactive website for the public that illustrates the prevalence of 
particular traits in each breed. One version of the site shows the results for all eight 
behavioral factors, and the other, the results for the most heritable questions: Q17 H; Q54 


14 


(DOG retrieves objects); Q60 (DOG avoids getting wet); Q121 (When DOG is next to 
someone how high are HIS shoulders); Q123 (How much white fur does DOG have); 
Q124 (Is DOG's tail curly); Q125 (What is DOG's ear shape); Q127 (How long is DOG's 
fur on HIS back and sides). We grouped dogs in each breed into three groups based on 
their scores: “any” (all dogs); “negative” (dogs in the lower 25% quantile bound for all 
dogs; “positive” (dogs in the upper 25% quantile. We then empirically measured the 
frequency of all possible combinations of responses in each breed. We included both 
candidate and confirmed purebred dogs to maximize the number of breeds (minimum 
number of dogs / breed = 500). Input files for the website are included in data release (see 
Data and materials availability). 


Population Peculiarity Scoring (PPS) 

We used permutation testing to test whether populations of dogs, as defined by 
breed or age, differed significantly in their survey responses from randomly sampled 
groups on any question or factor. All dogs with any survey responses were included. We 
used R version 4.0 and custom scripts (see Data and materials availability). For each 
permutation, for a given sample size N (table S14), we calculated the mean (the observed 
test statistic) for each question/factor for N dogs sampled from among dogs of each type. 
For each permutation, we also calculated the mean for a random sampling of size N from 
the full dataset (the permuted test statistics). We counted how often the observed test 
statistics for each population were higher than the permuted test statistics. 


We ran a total of 500,000 permutations and summed the counts. We calculated the 
one-tailed empirical p-value as the fraction of random samples where the permuted 
Statistic is as large or larger than as the observed test statistic, and converted (1-p) into a 
Z-score, using the standard R function qnorm(), that matched the direction of the original 
survey score. We calculated 2-tailed p-values corrected for multiple testing by counting 
how often each observed test statistic exceeded the maximum of all permuted statistics 
(i.e., over all questions or factors), or was less than the minimum, for each of the 
replicates (/22). This max(T) permutation approach preserves the correlational structure 
between questions, and thus is more appropriate than a Bonferroni correction, which 
assumes all tests are independent. When calculating permutation p-values, we added a 1 
to the numerator and denominator to account for misestimation of the p-value (/23). 


We assessed whether the number of dogs available for sampling influenced the PPS 
results and found a small but discernible effect in the candidate purebred analysis. For the 
candidate purebred dogs, we set the minimum # dogs required to 25, matching the size of 
each sampling. Thus, for some breeds, each “random sampling” included the same dogs. 
While this allowed us to produce scores for many more breeds (up to 62, compared to 
maximum of 44 if the minimum is 50), breeds with smaller numbers of dogs available for 
sampling tended to have slightly more extreme z scores. In breeds with fewer than 50 
dogs, 3.4+1.9% of traits have significant PPS scores, compared to 1.8+1.3% (pr. 
test=0.00059) for breeds with more dogs. Number of dogs available for sampling is 
weakly inversely correlated with absolute z score = -0.14 (p=2.4x10°°; N=7,482). We 
note that the correlation between number of dogs available for sampling and larger PPS z 
scores would have the effect of making breeds look more differentiated, rather than less, 
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and thus the effect of sample size is unlikely to alter our conclusion that the behavioral 
differentiation between breeds is subtle. 


We quantified the effect using an analysis of variance (data S9). The magnitudes but 
not the directions of the permutation results were affected by the use of less stringent 
candidate purebred breed labels. The permutation z scores for candidate breeds were 
strongly correlated with scores for confirmed breeds (for factors, Pearson correlation= 
0.959, p= 5.95x10°!8; for questions, Pearson correlation = 0.934, p= 2.3x10°°°). On 
average, the z scores from the confirmed purebred sets are 0.82+0.61 larger (in either 
direction) than the candidate purebred z scores (fig. S13). 


We used the PPS results for candidate purebred dogs to assess the validity of breed 
stereotypes determined from three different sources (see Breed standards, stereotypes, 
and popularity section). For stereotype groups (descriptive words and historic working 
roles), we used the ¢ fest function in the R package ‘rstatix‘ version 0.7.0 (R version 
4.1.1) to test the mean difference in PPS z scores between breeds in the group and not in 
the group. We excluded groups with fewer than 3 breeds from analysis. For quantitative 
stereotypes, we measured the spearman correlation between the PPS z scores and the 
stereotype score using the cor() function in the R package ‘stats version 4.1.1. (R version 
4.1.1). For both, we used the Benjamini-Hochberg procedure to correct for multiple 
testing. 


MuttMix survey 

To assess perceptions of breed ancestry in mixed-breed dogs by non-owner 
observers, we designed a survey hosted at www.MuttMix.org in which participants guess 
the three breeds detected at the largest percentage in each dog. The survey consisted of 30 
mixed-breed dogs with ancestry assignments and one undeclared purebred dog (fig. S33). 
Owners of the 31 survey dogs were asked to provide a portrait and full body image, as 
well as a short video clip of the dog while in motion. In addition to the visual aids, 
owners indicated their dog’s size relative to an average person via a visual graphic and 
noted other physical descriptions such as coat texture, markings or any concealed traits. 
The images and information provided by owners was consolidated and created into the 
MuttMix survey, where participants could guess the top three breeds. 


Participants indicated whether they belong to the general public or are a dog 
professional and/or breeder. Participants were provided with 59 breed options to select as 
well as a “no choice” option for the third breed slot. Dogs were displayed in random 
order to participants. Participants were permitted to exit the survey at any time, return 
later, or leave the survey incomplete, but could not skip dogs. The survey launched on 
April 16th, 2018 and closed on June 16th, 2018, collecting responses from 26,639 people 
over a two month period. Survey data was compared to genetic breed ancestry, with any 
breed call below 5% removed; only breeds offered as survey options were examined. 


To calculate the average total percentage of ancestry guessed correctly, we first 
calculated the percentage guessed correctly by each user for each dog individually by 
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summing the real percent ancestry attributed to their three breed guesses. We then 
calculated the mean of all these values. 


To assess the accuracy of user guesses of breed ancestry for MuttMix dogs, we first 
counted the number of breed guesses for a given dog that were among the top three 
breeds detected by our breed ancestry inference for that dog (or in the top two, if only 
two breeds comprised >5% ancestry). We find that the rate at which MuttMix users 
guessed a given breed ancestry is strongly correlated to the popularity of that breed (fig. 
S20), with no significant correlation to the position of a given breed in the guess 
dropdown menu. The expected rate of accurate guesses was determined using the overall 
population percent ancestry assigned to the breed in the “Darwin’s Ark: Full Genetic Set” 
dataset (fig. S20). We calculated how often we expect to see each possible combination 
of breed guesses (N=32,509 for guesses for all 3 breeds; N=1,711 for guesses with just 2 
breeds) by random chance if the guess rate for each breed is the overall population 
percent ancestry (table S2). For each dog, we then classified each set of breed guesses as 
0 breeds correct, 1 breeds correct, 2 breeds correct or 3 breeds correct, and summed these 
to get the expected rate of guesses with 1+ breeds correct, 2+ breeds correct and 3 breeds 
correct. We then calculated the observed rate of guesses with 1+ breeds correct, 2+ 
breeds correct and 3 breeds correct for each dog, and then calculated the ratio of the 
observed rate to the expected rate (and did the same for guesses of just 2 breeds). We 
measured the significance of the difference between observed and expected correct 
guesses using the chisq.test function in the R package ‘stats * version 4.1.1 (R version 
4.1.1), and applied Benjamini-Hochberg correction (table S15). 


To assess the importance of individual physical traits in participants’ breed choices, 
we compiled binary phenotypes for each of the above traits for each survey dog (see 
Data and materials availability for data files) and constructed a decision stump (a one- 
level decision tree) to assess how presence or absence of a given trait in a mutt impacted 
entropy among MuttMix guesses of the presence of a given breed ancestry. In particular, 
for each breed ancestry option, we calculated how well mutt phenotype for each of eight 
different traits (height, leg length, ear type, coat type, coat length, coat furnishings, white 
spotting, pigmentation) distinguished participant guesses of presence versus absence of 
that ancestry. For all breeds and all traits, we applied a leave-one-out analysis, omitting 
guesses for each mutt in series, to assess the impact of guesses for that mutt on entropy 
reduction. To calculate significance, we randomized trait assignments across mutts, then 
asked whether entropy reductions given true traits were greater than those expected from 
randomly assigned traits. 


Linear mixed-effects regression models (LMERs) 

We constructed linear mixed effects regression (LMER) models using the R package 
‘Ime4qtl* (124) (extending upon the package ‘/me4 (/25)) (see Data and materials 
availability for script file) to measure the relationship of genetic breed ancestry with 
behavior question responses, normalized behavioral factor scores, and physical traits. 
Dogs with >45% ancestry from any one breed were excluded to eschew bias from recent 
breed admixture and purebred dogs. We included only breeds with a standard deviation 
>5% ancestry across dogs carrying that breed. Question and factor scores were treated as 
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the independent variable. Breed ancestry from each of the remaining 68 breeds were 
scaled and treated as fixed effects. Three age brackets were considered as random effects: 
2 years and under, between 2 and 12 years, and 12 years and older. The covariance 
matrix for genetic relationships between individual dogs was produced from the genetic 
relationship matrix (GRM) (see SNP-based heritability estimates). As the GRM was not 
positive definite, we generated a close positive definite version of the covariance matrix 
using the R package ‘psych’ function cor.smooth() to provide the random effects of 
relatedness. A total of 1,002 dogs were included. 


For each factor, we built a model with restricted maximum likelihood (REML) to 
obtain unbiased estimates, standard deviations, and Wald statistics (t.val) for the fixed 
effects of breed on factor score. We performed analysis-of-variance on each factor model 
to obtain the breed F-statistics. We also constructed each factor model using maximum 
likelihood (ML) with and without each breed, and performed an analysis of variance to 
obtain the likelihood ratio for each breed. We report p-values for the likelihood tests 
between the ML models, but not for REML models in which the derivation of p-values is 
not appropriate (/26). 


We obtained the proportion of variance explained by the fixed effects (breed 
ancestries) as the marginal Nakagawa's R? for each factor modeled with and without 
restricted maximum likelihood (127) (data S14). The conditional Nakagawa’s R?, which 
is the variance explained by the random effects, which are the age bracket and kinship 
covariance matrix, and fixed effects, which are the breed ancestries, could not be 
ascertained for all factors due to singularity in some models from the random effects 
structure being too complex. The source of structure complexity derived from the 
correlation between relatedness and breed ancestry, and upon removing the random 
effects of relatedness, only one factor model (factor 5) was singular. 


Map of genes and open chromatin regions 

We matched dog genes to human homologs with Ensembl gene annotation, GRCh38 
(version 104.38) and CanFam3 (version 104.31), which gave us a set of 16,329 human 
gene-dog gene pairs for gene annotation. 


For functional annotation of coding and non-coding regions, we derived 179,541 
representative open chromatin regions (rOCRs) from 24 ATAC-seq datasets downloaded 
from BarkBase portal (http://www.barkbase.org/) (102). ATAC-seq datasets are 
processed using the standard ENCODE ATAC-seq pipeline (GitHub repo: ENCODE- 
DCC/atac-seq-pipeline). In short, trimmed FASTQ reads were aligned to the canine 
reference genome canFam3 using Bowtie2 (/28). SAMTools and picard-tools were used 
for post-alignment filtering. MACS2 (/29) and IDR 
(https://sites.google.com/site/anshulkundaje/projects/idr) were used for peak calling and 
filtering. 


We used all samples with an IDR FRiP score greater than 0.05 (N=22). ATAC-seq 


IDR peaks were filtered based on signal (CPKM >10 percentile of all qualified samples), 
length (>150bp) and FDR (<10°). We also included peaks from a skeletal muscle and a 
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occipital cortex sample with a more stringent FDR threshold (<10"'°) for better tissue 
diversity. A previously described (/0/) scheme was used to cluster ATAC-seq IDR peaks 
across samples, and the peak with the highest signal within each cluster was selected as 
the representative open chromatin region (rOCR) of the cluster. 


Genome wide association studies (GWAS) using mixed linear models 

We performed genome-wide mixed linear model-based associations on our 
Darwin’s Ark cohort genetic data set (see Data and materials availability) using the 
“leave-one-chromosome-out” approach (MLMA-LOCO) implemented in GCTA (version 
1.92.3 beta 3) by calculating multiple GRMs so that the model will exclude the random 
effects from the chromosome of each candidate SNP (fixed effect). Included were 
categorical covariates for sex and data type (genotyping or low-pass sequencing), as well 
as quantitative covariates for height and age for non-morphological traits. We used the 
human thresholds for genome-wide significance (p= 5x10°*) and suggestive associations 
(p= 1x10) owing to the similarity in linkage block length between humans and diverse 
dogs (J, 76). 


To define regions of association, SNPs in linkage disequilibrium (r’>0.2 and r’?>0.5) 
and within range (250 kb) of associated index SNPs were clumped and annotated via 
PLINK --clump, referencing the gene map described above, and reported results in data 
S16. We also performed clumping for SNPs in linkage at r?>0.8 within 250kb to compare 
the sizes of mapped regions (clumped for r?>0.8 within 1Mb) with osteosarcoma- 
associated regions found by within-breed studies (83) in greyhounds (22, 22, 96, 90, 372, 
82, 182, 72, 32, 198, 52, 105, 1304, and 72 kb), Rottweilers (1208, 330, 18, 19, 21, 487, 
131, 16, 20, 26, 1988, 509, 54, 108, and 21 kb), and Irish wolfhounds (746, 1588, 1382, 
and 737 kb). We summarized the mean, median, and 25% and 75% quantiles for these 
osteosarcoma-associated regions and our suggestive (p= 1x10°°) associated regions for 
physical traits and behavioral traits, and did not summarize any single-SNP associations 
that had no SNPs in linkage. We found that our regions extend to a median 5.6kb (Q25- 
75%=2.0-14kb, mean 16.8kb) around suggestive behavioral loci and 5.7kb at physical 
trait loci (1.4-22kb, mean 26.2kb), and that the intra-breed associations for osteosarcoma 
mapped at median ranges of 86kb (Q25-75%=57-162kb) in greyhounds, 54kb (21-409kb) 
in rottweilers and 1Mb (743kb-1.4Mb) in Irish wolfhounds. 


The genomic inflation factors at 50th percentile for each study were calculated using 
the Python package ‘scipy’. Genomic inflation factor A = np.nanmedian(chdtri(1, | - 
p_array)/chdtri(1, 1 - 0.5), where p_array is the array of GWAS p-values. 


To assess how much phenotypic variance was explained by associated regions, we 
derived genetic relationship matrices for the set of SNPs clumped for suggestive 
association (p= 1x10°°) with each trait and the set of all other SNPs, and estimated the 
partitioned heritability to measure the proportion of total heritability unattributed by 
discovered associations. 


Just as with the heritability estimates, we also ran mixed linear model-based 
associations fitting the first 10 principal component eigenvectors as covariates and 
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excluding 46 related individuals to achieve added control over population structure. 
Overall, we found small shifts in the top associated loci in this context. Out of 588 
associations (1,388 SNPs of p<1x10°) detected in the original studies for behavioral 
question and factor scores, 71/109 SNPs (65.1%) were still significant (p<5x10-8) and 
1,145/1,388 SNPs (82.5%) remained suggestive (p<1x10°°). We report in data S16 when 
any associated locus did not replicate in this context (p>1x10°). 

In the main text, we present the association results without PCs fitted as it is not 
clear that including PCs is more accurate and doing so may constitute an over-correction. 
The mixed linear model-based association (MLMA) analysis already controls for 
population stratification and cryptic relatedness as well or better than the corrections 
offered by principal components regression (/30,/3/). Currently, PC-based approaches 
are used when MLMA is not computationally feasible (/32). Principal component 
regression can be regarded as an approximation of a linear mixed model and may largely 
duplicate corrections already incorporated into the analysis (/33). Furthermore, unlike 
using kinship, selecting the number of PCs to include is subjective and could thereby bias 
results (34). 


Size prediction model 

We used responses to question 121, “When DOG is standing next to someone of 
average height, how high are HIS shoulders?” with the answer options of ankle high or 
shorter (“0”, 3% of dogs), calf-high (“1”, 30%), knee-high (“‘2”, 40%), thigh-high (“3”, 
24%), or hip-high or higher (““4”, 3%), as phenotypes for building predictive models for 
body size. 


A set of 1,730 adult dogs over 18 months old from the Darwin’s Ark cohort was 
used as the training and testing set. To assess the power of the prediction model, we did 
ten-fold cross validation in which we split the data into 10 folds. At each round, 9 folds 
of dogs were taken as the training set, and the rest were taken as the testing set. The 
validation set includes a cohort of dogs recruited from the 2017 Somerville Dog Festival 
in Somerville, MA, with height to withers in inches measured at the event and a cohort of 
dogs in which owners were instructed to measure the height from their dog’s shoulder to 
the ground (see Validation of size phenotypes). In total, 95 dogs were included in the 
validation set in addition to the training and testing set. We converted these 
measurements to the height question Likert scale by linear interpolation where {ankle- 
high: 0 = 4 in, calf-high: 1 = 10 in, knee-high: 2 = 18 in, thigh-high: 3 = 25 in, hip-high: 4 
= 35 in from floor} according to body measurements of a 5’4” female human. 


At each round, we performed a genome-wide association study on the training set, 
chose the predicted SNPs based on their strength of association, and built a random forest 
regression model using the selected SNPs. Once the model was built, we assessed 
prediction accuracy using the testing set to get the prediction accuracy and mean squared 
error. The final prediction accuracy and mean squared error were averaged across the ten 
testing sets. We iterated through different p-value cutoffs, from 1x10° to 1x10°, when 
selecting predicted SNPs and chose the cutoff with the best performance on the testing 
set. We built the model with the R package ‘randomForest’ and selected a p-value cutoff 
(1x10°>) and mtry and ntree parameters (576 and 500, respectively, determined by 
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iteration) to achieve the best model performance. We built a single model with 1,730 
dogs and 2,733 SNPs and validated this model using our validation set of 95 dogs 
independent from the 1,730 adult dogs described above as the training and testing sets. 


Our random forest regression model predicted the size to be numeric and we then 
rounded the decimal numbers to integers to match the survey measurement from 0 to 4. 
Besides building models based on GWAS-selected SNPs, we built models based on 
randomly selected SNPs as a control. We followed the same pipeline by splitting the data 
into ten folds. At each round, we randomly selected a matched number of SNPs to each 
p-value cutoff and built models on this. Prediction accuracy and mean squared error were 
calculated the same way described above. 

Gene sets enrichment analysis 

To assess the enrichment of sets of functionally related genes, we applied MAGMA 
(version 1.09) (97) on the GWAS summary statistics for 136 traits, including 14 physical 
traits, 114 behavioral questions, and 8 behavioral factors. MAGMA calculated gene-wide 
p-values by combining the p-values of all SNPs inside genes while accounting for gene 
size, number of SNPs in a gene, and LD. Using gene-based p-values, it tested for 
enrichment of association signals in genes belonging to the same set. 


Gene sets were compiled from three main sources. First, curated neuropsychiatric 
genes. The autism spectrum disorder (ASD) gene set included 820 genes from the SFARI 
database which centers on genes implicated in autism susceptibility (8). The obsessive- 
compulsive disorder (OCD) gene set comprised 62 manually curated genes from OCDB, 
a database collecting genes, miRNAs and drugs for OCD (87). The schizophrenia (SCZ) 
gene set combined genes from two studies, the PGC2 GWAS in 2014 (89) and the UK 
CLOZUK GWAS in 2018 (90), which gave us a total of 371 genes. Second, to test 
whether dog size associated loci are enriched in GWAS for dog non-physical traits, we 
compiled a set of dog size-associated regions based on our size GWAS, the final set 
contains 75 regions with 4.6Mb in total. Third, top genes expressed in GTEx tissues (85). 
We compiled a GTEx gene set by choosing the top 100 expressed genes in 25 tissues and 
13 brain subregions. For arteries, brain, cervix, colon, esophagus, heart, kidney, and skin 
that were sequenced at subregion level in GTEx, we merged the subregions into single 
tissue by taking the average as the expression level for each gene, then chose the top 100 
genes. We reported raw p-values from MAGMA for each GWAS-gene set pair, as well as 
gene-wide p-values for significant enrichment (data S17). 


Genetic differentiation of breeds 

We defined a set of 27,674,130 SNPs genotyped in at least 80% of dogs and 50% of 
wolves. Samples from Darwin’s Ark were assigned either to a breed population 
determined by ancestry inference (> 85% breed) or labeled as ‘other_dog’. All available 
sequencing and imputed array data collected from purebred samples were assigned to 
their respective populations (data S4). We calculated genome-wide normalized 
population branch statistics (PBS, implemented in the Python package ‘scikit-allele’) 
using the Hudson estimator of fixation (Fst) for each breed (N > 12 dogs) relative to dogs 
overall and wolves in sliding windows of 100kb by 10kb containing at least 10 SNPs (92, 
135, 136). Across all windows, the genome-wide empirical p-values based on rank 
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relative to other windows ((r+1)/(n+1)) were calculated for all test statistics (122). A 
summary of the per-breed population branch statistic is in data S18. 


For each trait-associated locus (p < 1x10°°) from our genome-wide association 
studies, we permuted the region of combined overlapping LD-based clumps ~100,000 
times on the same chromosome using BEDTools shuffle function, mapped with 
BEDTools map the max(PBS) for each real locus against the distribution of permuted 
loci for each breed (data S19), and generated z scores. We divided locus tests into 
physical trait, behavioral question, and behavioral factor associations, and performed a 
one-tailed Student’s t-test for whether the observed max(PBS) in associated loci 
exceeded that which we expect by random chance, and corrected for multiple testing 
using the Benjamini-Hochberg procedure (data S20). 


We also tested whether the allele frequencies at the most associated SNPs tended to 
differ more in breeds than at SNPs sampled at random across the whole genome, as might 
be expected if traits were breed-differentiated (fig. S34). For each SNP, we calculated the 
max Fst observed between one of the top ten breeds and the full dog population. We 
measured whether SNPs associated with behavioral traits and physical traits tended to 
have higher max Fsr than 29,903 randomly sampled SNPs using a one-sided t-test (R 
package ‘stats version 4.1.1). For physical traits, the maximum Fst observed for the top 
SNP in each locus (1mb) was slightly higher than for random SNPs (mean = 0.150 vs. 
0.145; t-test p7-sided=0.026) but this was not true at behavioral trait loci (mean = 0.142 vs. 
0.145; t-test p1-side¢=0.95). For physical traits, the most strongly associated loci (p<le-12) 
tended to have higher breed Fsrts, consistent with breed differentiation (mean = 0.33 vs. 
0.145; t-test p1-sided=0.0023). 


Supplementary Text 


Marketing of DNA tests for breed ancestry 

To determine whether breed testing was being marketed as a behavioral predictor, 
we searched the most popular genetic testing services and media articles for references to 
determining behavior, personality or trainability from a DNA test and found several 
examples: 


A. Wisdom Panel: 
“Consider learning more about their breed background with a Wisdom Panel™ DNA 
test. The insights you’ll gain will allow you to tailor training and care to your dog’s 
unique needs and preferences—helping you build your special bond.” 
Source: Why dogs have favorite people (and how to make sure your pup picks you). 
https://web.archive.org/web/20210705161645if_/https://www.wisdompanel.com/en-us/blog/why-dogs- 
choose-favorite-people (2021). 


B. Amazon (marketing for Embark): 
“Seeing the breeds your dog inherited will help you understand their personality 
along with what keeps them happy. Do they need a lot of exercise? Are they food- 
motivated? Do they like a good brain tease? Adapt your care routine based on their 


breed results.” 
Source: Embark | Dog DNA Test | Breed & Health Kit | Breed Identification & Canine Genetic Health 
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Screening. https://web.archive.org/web/202 10705 162556/https://www.amazon.com/dp/B01EINBA76 


C. Orivet Genetic Petcare: 
“Collect your own Dog's DNA with a simple cheek swab. Orivet will compare your 
dog's DNA to hundreds of genetic markers of the most common known breeds to 


uncover ... Insights into Your Dog's Personality and Behaviour” 

Source: Orivet Genetics. 

https://web.archive.org/web/202 10122 163410/https://www.orivet.com/store/canine-mixed-breed- 
screen/dog-breed-identification-test. 


D. DNA My Dog: 
“DNA My Dog's simple cheek swab DNA test lets you learn the breeds in your dog 
and gain insight into the unique genetic background of your dog including the history 
of their breed, personality traits, exercise levels, and so much more!” 


Source: DNA My Dog: Fast, easy and completely painless. 
https://web.archive.org/web/202 105032 13607/https://dnamydog.com/. 


E. VCA Pet Hospitals: 
“Every purebred has certain established physical and personality traits that provide 
owners with an idea of the type of pet they are getting. Knowing the breeds that go 
into a mix can help the owners make better guesses about the size, temperament, 


energy levels, and exercise requirements their unique pet may have.” 
Source: Genetic (DNA) Testing. 
https://web.archive.org/web/20200926224943/https://vcahospitals.com/know-your-pet/gene 


Overview of sample sizes and inclusion/exclusion criteria for all analyses 

The following describe the source of each data set and samples, the inclusion and 
exclusion criteria, and maximum final sample size for all analyses. Several analyses vary 
in sample size depending on how many dogs have survey responses for a given trait. 


Data Set Criteria Sample Size In Figure 1F 
Darwin’s Ark: Enrolled prior to November 15, 2019 18,385 dogs surveyed 
Full Survey Set 
Darwin’s Ark: Survey Dogs in “Darwin’s Ark: Full Survey Set” = 14,327 sex, age, and 
Set with Covariates phenotyped for age, sex, and height size info 
Phenotype Validation: Owner completed all survey questions 10,253 dogs 
Concordance with DPQ derived from published Dog Personality 

Questionnaire 
Phenotype Validation: Owner completed a size measurement 337 dogs 
Owner-measured worksheet and responded to size survey 
Height question 
Phenotype Validation: Dog was measured at in-person event 38 dogs 
Staff-measured Height 
Phenotype Validation: Owner reported dog to be a purebred ofa 2,025 dogs 


Breed-average Height 


breed with average heights reported by 
kennel club (90 breeds) 
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Exploratory Factor 
Analysis: 
Discovery Set 
Exploratory Factor 
Analysis: 

Scored Set 


Mendel’s Mutts 


Darwin’s Ark: 
Full Genetic Set 


Darwin’s Ark: 
Genotyped Set 


Darwin’s Ark: 
Low-pass Set 


Heritability Analysis 


Breed Classification: 
Confirmed Purebred 


Breed Classification: 
Candidate Purebred 


Breed Classification: 
Mutt 


Breed Classification: 
Highly Admixed Mutt 


Analysis of Variance 


Analysis of Variance: 
Confirmed Set 


Owner answered the first 110 survey 
questions 


Discovery set and any dog with responses 
to >80% of first 110 questions 


Dogs of unknown breed ancestry that 
underwent high coverage, whole genome 
sequencing 


Dogs genotyped or sequenced prior to 
November 15, 2019 


Dogs genotyped on the Axiom Canine 
Genotyping Array Set A & B prior to 
adoption of low-pass sequencing 
approach 


Dogs with low coverage genome 
sequencing on Gencove platform 


In “Darwin’s Ark: Full Genetic Set” and 
in “Darwin’s Ark: Survey Set with 
Covariates”, and phenotyped for trait 


Either described as registered purebred by 
the owner, or confirmed by sequencing to 
have 85% or more ancestry from a single 
breed 


All confirmed purebred dogs (“Breed 
Classification: Confirmed Purebred’’) as 
well as dogs with owner-reported ancestry 
from one breed 


All dogs not classified as either confirmed 
or candidate purebred. 


All dogs in “Darwin’s Ark: 

Full Genetic Set” with <45% ancestry 
from any single breed in admixture 
analysis. 


In “Breed Classification: Candidate and 
Confirmed Purebred” and in “Darwin’s 

Ark: Survey Set with Covariates” and in 
“Darwin’s Ark: Full Genetic Set” 


In “Breed Classification: Confirmed 
Purebred” and in “Darwin’s Ark: Survey 
Set with Covariates”. Phenotyped for trait 
in question. 


10,253 dogs 


16,522 dogs 


27 dogs 


2,155 dogs 


440 dogs 


1,715 dogs 


varies by trait, 
at most 1,967 
dogs 


3,637 dogs 


9,009 dogs 


9,376 dogs 


1,071 dogs 


6,797 dogs 


varies by trait, 
on average 
2,2514119 dogs 
across 78 breeds 


genetic data 


confirmed breed 


candidate breed 


mutt 


ANOVA pool 
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Analysis of Variance: 
Candidate Set 


Relative Risk Analysis 


Population Peculiarity 
Scoring 


LMER Analysis 


MuttMix Survey: 
Dogs 


MuttMix Survey: 
Participants 


Genome-wide 
Association Analysis: 
All Dogs Set 


Genome-wide 
Association Analysis: 
Highly Admixed Set 


Size Prediction Models: 
Training and Testing 
Set 


Size Prediction Models: 
Validation set: 


In “Breed Classification: Candidate and 
Confirmed Purebred” and in “Darwin’s 
Ark: Survey Set with Covariates”. 
Phenotyped for trait in question. 


In “Breed Classification: Candidate and 
Confirmed Purebred” and phenotyped for 
trait in question. 


All dogs in “Darwin’s Ark: 
Full Survey Set”. Groupings described in 
methods. 


Dogs in “Darwin’s Ark: Survey Set with 
Covariates”, “Exploratory Factor 
Analysis: Scored Set”, and in “Breed 
Classification: Highly Admixed Mutt” 


Dogs in “Darwin’s Ark: 

Low-pass Set” or “Mendel’s Mutts”. 
Owners did not know their dog’s ancestry 
and submitted photos and videos. 


All participants 


In “Darwin’s Ark: Full Genetic Set” and 
in “Darwin’s Ark: Survey Set with 
Covariates”. Phenotyped for the trait in 
question. 


In “Breed Classification: Highly Admixed 
Mutt” and in “Darwin’s Ark: Survey Set 
with Covariates”. Phenotyped for the trait 
in question. 


In “Darwin’s Ark: Full Genetic Set” and 
phenotyped for height. Dogs younger than 
18 months were excluded. 


In “Darwin’s Ark: Full Genetic Set” and 
either "Phenotype Validation: Owner- 
measured Height" or "Phenotype 
Validation: Staff-measured Height". Dogs 
younger than 18 months were excluded. 


varies by trait, 
on average 


5,9684332 dogs 


across 121 
breeds 


varies, at most 
266 dogs for a 
breed x factor 
pair 


18,385 dogs 


1,002 dogs 


31 dogs 


26,639 people 


varies by trait, 
at most 1,967 
dogs 


varies by trait, 
at most 1,071 
dogs 


1,730 dogs 


95 dogs 


PPS pool 


LMER pool 


GWAS pool 


2 


DPQ Question-Question Correlations 


Darwin's Ark 


-0.8 -0.4 0.0 0.4 0.8 
Dog Personality Questionnaire 


Fig. S1. Concordance between original Dog Personality Questionnaire (DPQ) and 
Darwin’s Ark implementation of DPQ items. The Dog Personality Questionnaire 
(DPQ) published question x question correlation data from 2,556 dogs for 48 questions 
also included in Darwin’s Ark (37). We extracted the correlation distances (d= 1 - |r|) for 
responses from 10,254 dogs in Darwin’s Ark and performed a Mantel’s test between the 
DPQ and the Darwin’s Ark correlation distances using R package ‘ade4* with 100,000 
replicates. The correlation was nearly perfect (Mantel’s correlation=0.95; p = 1 x 10°’). 
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jo0| R=0.84 _p=5.98e-93 


A B 
( Q121. When your dog is standing 
next to someone of average height, 
~\ row high are their shoulders? 
Ve response relative 
option height score 
a 


hip high 4 


owner-measured height (cm) 
wo 
s 


<— thigh high 3 | 
| — knee high 2 asl cA: 
—_ calfhigh 1 7 
| <— ankle high 0) n=10 n=106—n=125 n=83 so n=13 
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D E 


R=086 p=2,9e-12 


ankle high calf high knee high thigh high hip high 


“i th ht d 


staff-measured height (cm) 
R 
breed standard height (cm) 


ankle-high call-high knee-high thigh-high 0 


ankle-high calf-high knee-high thigh-high _ hip-high 


Fig. S2. Validation of surveyed size phenotypes against real measurements and 
reference data. (A) Darwin’s Ark question 121 asks owners to select their dog’s 
shoulder height relative to an average person on a five-step scale, with the options from 
“ankle-high or shorter” to “hip-high or higher’. The survey response is converted into a 
relative height score ranging from 0 to 4. (B) For a subset of dogs, we asked owners to 
measure their dogs using a measuring tape and worksheet we provided. Panel 4 (height- 
to-withers) is the measurement coarsely captured by question 121. (C) Owner-measured 
height-to-withers is strongly correlated with owner responses on question 121 (N=337 
dogs; Rpearson=0.84; t=28.885, df=335, p=5.98x10°73). (D) Staff-measured height-to- 
withers for 38 dogs (collected at the Somerville Dog Festival in 2017) is strongly 
correlated with owner responses on question 121 (N=38 dogs; Rpearson=0.86; t=10.284, 
df=36, p=2.9x10°!”). (E) Breed average height, as defined in the AKC breed standard, is 
strongly correlated with owner responses on question 121 for purebred dogs (N= 2,025; 
Rpearson=0.85; t=71.268, df=2,023, p<2.2x10°!°). (F) For the 31 dogs in MuttMix, 


participants were given each dog’s size on the Q121 scale, as illustrated using the graphic 
shown. 


At 


A Question #123 
How much white fur does DOG 
have? Select the image with the 
closest amount of white. 


5 J 


/ 6 si 


5. Surgically cropped ears 


Question #125 

What is DOG's ear shape? Select 
the image with the closest ear 
shape. 
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AS ion) 
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3. Rose ear or button ear 4. Pricked ears 
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6. Not sure 


Cc Question #127 
How long is DOG's fur on HIS back and 
sides? Measure it against your finger. 


1) shorter than 
first knuckle —> 


2) medium 
(second knuckle) 


i 
\ 

. one | \{ 

3) long (finger “<-> 1 
length or longer) © | i | yf 


Fig. S3. For three physical trait questions other than size (table S1), owners were 
asked to choose from a range of possible options shown in an accompanying image. 
Questions were designed to be as easy to answer as possible. (A) For question 123 (white 
coat color), options range from all white to all black. Images were sourced from (/37). 
(B) For question 125 (ear shape), options range from long and dropped to pricked. Two 
responses (surgically cropped ears and not sure) were excluded from subsequent 
analyses. (C) For fur length, owners are asked to respond using an easy-to-access 
measuring tool - their own finger. One other physical trait had multiple options but did 
not have an accompanying image. For coat color (not shown) owners chose one or more 
options from a list of colors including: (1) white; (2) red (from pale peach to dark red or 
liver colored); (3) yellow (from pale cream to orange, gold, fawn, or wheaten); (4) gray 
(slate, blue gray, charcoal, or silver); (5) chocolate brown; (6) pure black; (7) merle; (8) 


brindle. 


28 


1 Q Q @ Non Graphical Solutions to Scree Test 


© Eigenvalues (>mean = 24 ) 

* Parallel Analysis (n = 20) 
Optima! Coordinates (n = 20) 
Acceleration Factor (n = 2) 


So 

3 

Nh 
————_— 0 


10 


21} 035| 3 & eS © 
° 
on | 
-0.11 | 0.44) 0.2 4 @ 
g 
3 
0.44 | 0.43 | -0.07 | -0.32 5 ae) 4 © 
w od} 


0.44 | 0.08 | -0.31 | -0.12 | 0.47 6 | 


-0.25 | 0.04 | 0.54 | 0.28 |-0.14|-0.28) 7 | ©) 


-0.52 | 0.01 0.36 | 0.19 | -0.3 | -0.28 | 0.33 8 
1 


-1 -0.5 0 05 Components 
C«¢ 
4 
2 
: 
= 
§ o 
-2 
-4 
factor 1 factor 2 factor 3 factor 4 factor 5 factor 6 factor 7 factor 8 


N=12516 N=12487 N=13256 N=14586 N=13753 N=14350 N=12987 N=13544 
Fig. S4. Exploratory factor analysis. To reduce the dimensionality of the survey data, 
and identify underlying associations between questions, we performed exploratory factor 
analysis on 1,127,830 total survey responses from 110 questions for 10,253 dogs. (A) 
Between-factor correlations show most are only weakly or moderately correlated. Factor 
numbers are listed along the diagonal, and Pearson's r values below the diagonal. (B) 
Scree plot for survey data used to predict optimal number of factors; test algorithms 
include eigenvalues, parallel analysis, optimal coordinates, and acceleration factor. The 
first 8 factors explained a cumulative 24.26% of variance. (C) Distribution of factor 
scores for each factor in Darwin’s Ark: Full Survey Set, with the three horizontal lines 
marking quantiles 0.25, 0.5 and 0.75. 
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distance between SNPs (kb) 


Fig. S5. Decay of linkage disequilibrium (LD) in dog populations with different 
histories and in wolves. LD is shorter in mutts (blue) than in dog breeds (green) but 


slightly longer than in outbred village dogs (red) and substantially longer than in wolves 
(black). For each population, 25 dogs were chosen at random from all dogs/wolves in the 


population. We assessed the extent of LD by measuring r* between each of 20,000 


randomly selected SNP and all variants with minor allele frequency > 0.025 within 100kb 


of the index SNP. 
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0.44 


5-fold cross-validation error 


0.42 
0 50 100 150 200 
K 
unsupervised ADMIXTURE analysis on chromosome 38 
with LD-based pruning (50kb, r2>0.5) 


Fig. S6. Cross-validation of unsupervised admixture analysis in the Darwin’s Ark 
cohort (N=2,155). We tested a wide range of cluster numbers (K) and achieved the 
lowest 5-fold cross validation error between K= 50 and 100. 
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Fig. S7. Breed ancestry in mutts in the Darwin’s Ark cohort. (A) Our breed calling 
pipeline accurately measures breed ancestry in simulated mixed breed dogs, with inferred 
ancestry strongly correlated with the known (simulated) breed ancestry. (B) Coefficients 
of inbreeding estimated from the total length of autosomal runs of homozygosity divided 
by SNP-covered total chromosome length run higher in confirmed purebred dogs than in 
mutts (pr-tes=1.7x10°!7*; t=28.4, df=776.8). (C) The most common ancestry across all 
dogs is American pit bull terrier, followed by Labrador retriever and chihuahua. The 
proportion of the ancestry coming from purebred dogs vs. mutts varies by breed. An 
exceptionally high proportion of dogs with poodle ancestry (yellow asterisks) are 45-85% 
poodle, a range that includes F1 poodle crosses. (D) The proportion of ancestry by breed 
in the dog population varies by region of the United States, possibly reflecting differences 
in breed popularity in the past. 
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Fig. S8. Controlling for population structure via principal component analysis does 
not change SNP-based heritability estimates. We estimated heritability with standard 
errors using restricted maximum likelihood with LD score correction in two models: (1) 


all dogs with no principal component covariates (red) and (2) all dogs except 46 dogs 


from highly related pairs of kinship >0.2 with the top 10 principal components included 
(blue). (A) Heritabilities estimated by both models were highly correlated (Rpearson=0.974, 


p=1.06x107%). (B) The heritability estimates for physical traits did not differ between 


models, and the estimates for behavioral questions and factors rarely differed. The most 


dramatic shifts in heritability were observed for the question “Circles before pooping” 


and factor 4 (biddability). 
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Fig. S9. Breed explains a larger fraction of the variance in behavior phenotypes than 
size, sex or age. (A) Breed has larger effect than age, sex or size, measured as 
generalized eta squared (ges), on both questions and factors, but the effect is still modest 
except for breed-defining physical traits. (B) The effect of breed is strongly correlated 
with the heritability of a trait in the pet dog population. (C) Breed effect sizes for 
physical traits, physical-trait related questions and motor patterns are significantly higher 
than for other behavioral questions (Wilcoxon test using alternative = "greater" with BH 
correction). (D) The variance explained by breed in ANOVA analysis of candidate 
purebred dogs (6,832 dogs in 122 breeds with >=5 dogs/breed) is strongly correlated with 
results from confirmed purebred set, suggesting that the less stringent classification 
adequately captures the effect of breed. (E) Age has ges>0.1 (labeled peaks) for three 
questions, sex for one question, and size for none. 
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Fig. S10. Distribution of factor scores (based on owner surveys) in breeds differ 
somewhat from distribution in all dogs. For each factor, some breeds have distributions 
that are significantly different (ppu<0.05, red lines) from the distribution in all dogs (grey 
area), although most do not (black lines). Difference was measured using the 
Kolmogorov-Smirnov test, a nonparametric test of the equality of continuous 
distributions, with Benjamini-Hochberg FDR correction. Red text shows the number of 
breeds that are significantly different. All breeds with >50 dogs were included. 
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Fig. S11. Value of breed in predicting factors scores for individual purebred dogs. 
Fold difference (and 95% CI) in probability that a dog of a given breed will score in the 
top quartile for each factor, relative to a random dog. Red indicates fold probability 
significantly greater than 1; blue indicates significantly less than 1. 
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Fig. $12. Full set of population peculiarity scores (page 1/4). For each population 
tested, empirical z scores were measured over 500,000 random permutations. Circle size 
scales with abs(z) and color scales with z, with significant differences marked with “+” or 


so bb] 


. (A) Results comparing dogs of each year of age, rounded down (such that “Y01” is 


dogs between 0 and | years of age), to randomly sampled dogs, with sample size of 100. 
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Fig. S12. Full set of population peculiarity scores (page 2/4). (B) Results comparing 


dogs from each breed to randomly sampled dogs. 


confirmed breed 
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Fig. $12. Full set of population peculiarity scores (page 3/4). (B) Results comparing 


dogs from each breed to randomly sampled dogs. 
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Fig. $12. Full set of population peculiarity scores (page 4/4). (B) Results compar 
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Fig. S13. Features of Population Peculiarity Scores (PPS). Permutation testing of 
breeds compared to random dogs yielded a 2-tailed z score reflecting how differentiated 
the breed is on each factor and question. (A) PPS z scores for candidate and confirmed 
purebred dogs are strongly correlated for both factors (left) and questions (right), with 
confirmed purebred almost always (91.4% of tests) yielding more extreme z scores 
(median z score change of 2.21-fold for factors and 2.15-fold for questions). Results 
shown for four breeds with >100 confirmed purebred dogs. (B) Comparison of absolute z 
scores for different question types shows breeds are most differentiated for physical 
traits, and that physical trait related questions, motor patterns, and factors score higher 
than other behavioral questions. Differences are more pronounced in the candidate 
purebred analysis (top), which includes up to 40 breeds per question/factor, than the 
confirmed purebred (bottom), with just 4-5 breeds. T-test with FDR corrected p-values 
(BH procedure). (C) Differentiation in dogs grouped by year of age shows scores for 
some questions/factors are highly correlated with age. Most correlated 10% of points are 


labeled. 
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Fig. S14. No behaviors are exclusive to a subset of breeds. For breeds represented by 
>50 dogs (44 candidate breeds and 15 confirmed breeds), the fraction of owners choosing 
an answer at either end of the 5-level Likert scale (Never, Always, Strongly Agree, or 


Strongly Disagree) never reaches 100%. 
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Fig. S15. The 30 questions (representing top quartile) with responses most 
correlated (either negatively or positively) with year of age. PPS z score after 500,000 
permutations is on y-axis, and Pearson correlation, with FDR corrected p—values (BH 
procedure), is printed in the lower left or upper right of each plot. Words in parentheses 
show direction of score. For example, “(agree to disagree)” means low scores indicate 
differentiation towards “agree” and high scores indicate differentiation towards 
“disagree”. 
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Fig. S16. PPS correlation with year of age for all eight behavioral factors. PPS z 
score after 500,000 permutations is on y-axis, and Pearson correlation, with FDR 
corrected p-values (BH procedure), is printed in the upper left of each plot. The direction 
of the differentiation is printed next to the y axis with a two-headed arrow. Six factors (all 
except factor 5 and factor 4) are correlated with age. 
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Fig. $17. Breed stereotypes versus breed peculiarity scores. (A) AKC’s three 
descriptive words. Analysis of both single words and words grouped by shared meaning 
(all words and word groups describing >4 breeds included) using t-test estimates small 
difference in means for most words. Error bars show 95% confidence intervals around the 
t-test estimates. (B) Pearson correlation between behavior scores from “Encyclopedia of 
Dog Breeds” (/9) and breed PPS. 
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herding: australian cattle dog, australian shepherd, belgian malinois, border collie, catahoula leopard dog, collie, german shepherd dog, pembroke welsh 
corgi, shetland sheepdog 

hound: basset hound, beagle, dachshund, greyhound, rhodesian ridgeback, treeing walker coonhound, whippet 

sporting: american cocker spaniel, brittany, english springer spaniel, german shorthaired pointer, golden retriever, labrador retriever, vizsia 

terrier: american staffordshire terrier, jack russell terrier, miniature schnauzer, rat terrier, staffordshire bull terrier, west highland white terrier 

toy: bichon frise, cavalier king charles spaniel, chihuahua, havanese, maltese, miniature pinscher, papillon, pomeranian, pug, shih tzu, toy poodle, 
yorkshire terrier 

working: alaskan malamute, bernese mountain dog, boxer, doberman pinscher, great dane, great pyrenees, rottweiler, siberian husky 


Fig. $18. Historic working role versus breed peculiarity scores. This plot visualizes 
the difference in mean breed PPS score on each behavioral factor between breeds 
assigned a particular historic working role (grey filled circles) and other breeds (box 
plot). The arrow highlights the difference in the mean PPS score between breeds with a 
particular historic working role and other breeds, and is red when the shift is significant. 
Vertical lines are the mean for group, box encloses 25-75% quartiles (Q25-Q75), and the 
horizontal line extends from 1.5 interquartile range IQR) below Q25 to 1.5 IQR above 
Q75 + 1.5 IQR. Darker arrows are those reaching nominal significance (p<0.05). Red 
arrows are those significant after FDR correction (BH procedure). “(ns)” are not 
significant after FDR correction. 
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Fig. S19. Performance of Mendel’s Mutts survey participants. (A) On average, 
participants correctly guessed breeds comprising a total of 20.8% of the breed ancestry in 
the mutts. (B) Breeds comprising a higher proportion of a mutt’s ancestry are easier to 
guess, with all breeds > 45% guessed correctly by more than half of participants. Red text 
shows Pearson correlation. (C) Entropy decision stump analysis assessing what impact of 
presence or absence of an individual specific trait in a mutt has on probability that users 
would guess it to have ancestry from a given breed. Point indicates impact calculated for 
all 30 MuttMix mutts; bars span the range of values calculated when analysis was 
repeated iteratively, excluding each of the mutts in turn. Values in red indicate that a 


given trait significantly impacts the probability of users guessing ancestry from a given 
breed. 
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fraction ancestry in Darwin's Ark dogs _ fraction AKC registrations (2000-2015) _ position in Mutt Mix dropdown menu 
Fig. S20. MuttMix survey participants guessed popular breeds more often. The rate 
at which MuttMix users guessed a given breed ancestry is strongly correlated to the 
popularity of that breed, whether measured by fraction ancestry in Darwin’s Ark (left) or 
by fraction AKC registrations (middle), and there is no significant correlation to the 
position of a given breed in the dropdown menu used by participants to guess breeds. Red 
text shows Pearson correlation. 
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Fig. S21. When guessing breed ancestry in mutts, people are more accurate than 
random chance, but still often wrong. (A) Dogs with ancestry from more popular 
breeds tend to be guessed more accurately. Measuring the observed to expected ratio 
controls for this, and shows that people tend to be better than random chance (blue points; 
black is worse than random) at making 1+, 2+ and 3 correct guesses for most mutts (77%, 
73% and 69% of mutts respectively), but their accuracy is still low, particularly when 
guessing more than one breed. Some dogs, like Cooper and Lilly, are more easily guessed 
than others, like Scotch. Significance was calculated using a chi-squared test to compare 
observed to expected. (B) Self-described dog professionals are slightly but significantly 
better than non-professionals, with higher observed/expected ratio for 1+, 2+ and 3 
guesses for 93%, 87% and 75% of dogs respectively. Significance measured using paired 
two-sample Wilcoxon test. Horizontal black line is median, and black points are 
individual dogs. Red lines connect the points for the same dog. Statistical tests done using 
R package stats version 4.1.1 in R version 4.1.1. 
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Fig. S22. Effect of breed ancestry in mutts. Results shown for all factors, physical 
traits, physical trait related behaviors, and motor patterns. Red dots (with labels) are 
statistically significant after FDR correction (BH procedure). 
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Fig. $23. Effects of breed ancestry in perspective. (A) Pearson correlation between 
effects of breed ancestry in mutts estimated by the LMER models and breed PPS in 


confirmed purebred dogs. Positive correlations indicate LMER is concordant with PPS. 
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(B) Proportion of variance in factor score explained by the fixed effects of breed ancestry 
(the marginal R’), as well as variance explained by the added random effects of age group 
(2 years and under, adult dogs, and dogs 12 years and older) and genetic kinship (the 


conditional R’) in models not reaching singularity (tolerance = 1x10°). 
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Fig. S24. Overview of the breed effects from the ascertained ancestry of mutts 
(LMER analysis) versus breed effects in owner surveys of reportedly purebred dogs 
(PPS analysis). Each point is a breed-trait pair. These points are solid if the directionality 
of the breed effect is the same in both mixed-breed and purebred dogs. Points are colored 
if the pair are both significant (orange), significant for breed ancestry effects (green), or 


significant for reported purebreds only (purple). 
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M Factor 1: Human Sociability 
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N DOG is able to focus on a task in a distracting situation (e.g., loud or busy places, around other dogs) 
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Fig. S25. Genome-wide association studies. All Manhattan plots with Q-Q plots and 
phenotype distributions for the genome-wide association studies for all traits have been 
published to Figshare (DOI: 10.6084/m9.figshare. 16608793). Plots for the following 
phenotype studies shown are (A) body size as a quantitative trait in all dogs; (B) 
gigantism and (C) dwarfism as binary traits with average stature dogs as controls; several 
coat pigmentation phenotypes including (D) white spotting, (E) coat length, (F) coat 
texture, (G) roan and ticking coat patterns, (H) solid red coat color versus yellow coat 
color, (I) brindle coat pattern; (J) body size in highly admixed dogs with <45% breed 
ancestry; (K) Likert responses to whether dog “gets stuck behind objects and is unable to 
get around”; (L) whether dogs howl never, rarely, sometimes, often, or always; (M) 
human sociability factor scores; and (N) Likert responses to whether dog “is able to focus 
on a task in a distracting situation (e.g., loud or busy places, around other dogs)”. 
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Fig. S26. Size prediction model performance. The optimal p-value cutoff is p from 
1x10° to 1x10, with (A) accuracy (defined as predicted value within +/- 0.5 of true 
value) falling at both higher and lower thresholds and (B) mean square error lowest at 
5x10. Random forest models drawing from GWAS-selected SNPs consistently 
outperform models drawing from a random pool of SNPs, which has an average 
prediction accuracy of 0.52 + SD:0.05. Among models based on GWAS-selected SNPs, 
the most informative SNPs (defined as the top 100 ranked by Gini importance) carried 
absolute effect sizes of 0.239 + SD:0.093 in the height GWAS, a ~1.68 fold-change 
compared to 0.142 + SD:0.066, the effect sizes for all GWAS-selected SNPs used to 
build the model. The unselected models, too, derive predictive power from randomly 
drawn SNPs: the most informative SNPs carried absolute effect sizes of 0.044 + 
SD:0.034 in the GWAS, a ~1.69 fold-change compared to 0.026 + SD:0.024, effect sizes 
for all SNPs used to build the model. 
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Fig. S27. Genetic predictions of size phenotypes perform well in mutts and purebred 
dogs and validate against real size measurements. Size predictions made for (A) mutts 
are just as accurate against their surveyed size as in (B) purebred dogs. 
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Fig. S28. Resolving the ear shape and body size locus at chromosome 10. Regional 
plots on chromosome 10 from 7.6 to 8.8 megabases (CanFam3.1) for association with (A) 
body size, colored by r’ linkage with top SNP at 8,356,059 (diamond), and (B) ear shape, 
colored by r? linkage with top SNP at 8,027,948 (diamond). (C) The allele frequency and 
effect (beta) of the top ear shape-associated SNP. 
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Fig. S29. Confounding effects of breed-defining physical traits on genome-wide 
associations. Regional association plots for (A) the locus on chromosome 32 associated 
with a dog’s focus in distracting situations (Q21) with SNPs colored by r? linkage with 
the top associated SNP (diamond) and (B) the locus after inclusion of genotypes for the 
SNP associated with coat length differences (square) as a covariate. (C) Survey responses 


for Q21 against reported coat length. 
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Fig. S30. Breed-confounding physical traits can be addressed by controlling for the 
traits themselves or by adjusting for population structure. (A) Before adding PCs to 
the mixed linear model, we identified one significant (p < 5x10°°) peak at FGF5 on 
chr32, a gene also associated with coat length, and 4 other suggestive (p < 1x10°) 
associations. (B) By including the fur length SNP in FGF5 (chr32: 4,509,367), this 
association disappears (p=0.0001) but suggestive associations remain. (C) After inclusion 
of the first 10 PCs, the association with FGFS (p=1.2x10~) drops in strength, as do other 
suggestive associations. 
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Do psych 
GWAS GTex GTex: Brain genes 


F1 Human Sociability ) 
F3 Toy-directed Motor Patterns @0@ OO @ @@0CO 
F7 Environmental Engagement QO 
F8 Proximity Seeking Oo O OO O0°oO 
Q4 Enjoys playing with toys +: @ OO 
Q6d Bored in play quickly 
Q7 Excitement leads to repetitive behavior oO 
Q13 Boisterous OO 
Q16 Guards coveted items @ 
Q22 Playful with dogs oO 
Q23 Comes when called @ 
Q33 Sleeps more (e) 
Q38 Shows barrier aggression } (@) 
Q44 Aggressive to threatening people € 
Q45 People person 
Q46 Fearful to unfamiliar people @ c (e) 
Q48 Aggressive to unfamiliar people 

Q52 Works at tasks (e) 
Q54 Retrieves objects (binary) ) oO t 
Q66 Lifts leg to urinat 
Q70 Drinks water quick 


€ 
y e} 
Q72 Shares toys with dogs ® @ « ) @0 
s q 
S .@) 


JOJO JOIAeYaq 


uogsanb Joieyaq 


Q73 Dominant over dog 
Q75 Friendly to dog: 


Q80 Fails to recognize familiar people @ Ore 
Q84 Not keen on new situations @ oO ee OO Q@ @O O 
Q103 Seems dull or depressed @ Oo OO OO 
Q104 Anxious (e) @ 
Q109 Aloof (oe) 
Q122 Fur color (solid—yellow vs solid-red) } (e) = 
Q122 Fur color (ticking piebald—only) oO 3B 
Q124 Curly tail (oe) = 
Q125 Ear shape @ 2 
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Fig. S31. Gene set enrichment of associated regions. The raw and FDR-adjusted (BH 
procedure) p-values for enrichment of trait associated SNPs upon (1) the canine body size 
GWAS regions, (2) top non-brain tissue-expressed genes from GTex, (3) top brain- 
expressed genes from GTex, and (4) psychiatric condition gene sets. 
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all (2320935 pairs) both candidate (435711 pairs) 


100% 2297733 435018 


PQ b, 23152 40 10 0 663 25 5 0 
= © 
= both confirmed (200028 pairs) both mutt (744810 pairs) 
 100%4 199831 727709 

75% 

50% 

25% 
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Fig. $32. Kinship between pairs of dogs in the genetic data set. We estimated kinship 
k using the KING-robust kinship estimator (//8) and found only 28 pairs of dogs 
(0.0017%; 46 unique dogs) closely related (k>0.2). 
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1009: “Bailey” 
"Black patches on his tongue. Coat is very soft.” 


Traits: 

Size = thigh-high White = no 

Ears = down CoatLength = long 
Colors = 2 CoatType = straight 


1010: “Bella” 
“Double coat.” 


Traits: 

Size = knee-high White = piebald 

Ear: Pp CoatLength = long 
CoatType = straight 


1011: “Beskow” 


No additional information. 


Traits: 

Size = thigh-high White = piebald 
CoatLength = short 
CoatType = straight 


1012: “Boone” 
“He has black patches on his tongue and his coat 


Size = knee-high White = piebald 
Ear: CoatLength = short 
CoatType = straight 


1013: “Buddy” 
“Double coat, with guard hairs. Different colored 
nails, some white and some black. His nose faded 
from black to pink as he aged. " 


White = no 
CoatLength = medium 
CoatType = straight 


estas (26.47%) 


Actrahan Shepherd (34.0% ) 


(Chew Chow (10.6% ) 


Border Cate 74% ) 


Basset Hound (48.1%) 


g 


nee 


1014: “Clarence” 
“Purple spots on tongue. Light speckles in fur. The 
skin on his belly and legs is mostly black but with 
some light freckling.” 


Traits: 

Size = thigh-high White = no 

Ears = up CoatLength = short 
Colors = 2 CoatType = straight 


1015: “Cooper” 
“Bony dew claws on hind legs and webbed paws 
in front. Top coat is black, but undercoat is greyish 
tan and very dense.” 


Traits: 

Size = thigh-high White = minimal 
Ears = down CoatLength = short 
Colors = 1 CoatType = straight 


1016: “Dug” 


No additional information. 


White = piebald 
CoatLength = short 
CoatType = straight 


1017: “Esme” 
“Spots on belly.” 


Traits: 

Size = knee-high White = piebald 

Ears = up CoatLength = short 
CoatType = straight 


1018: “Gus” 
“Thick coat, freckles on nose, one ear is floppy and 
one stands up, curly tail, webbed toes.” 


Traits: 

Size = thigh-high White = no 

Ears CoatLength = short 
CoatType = straight 


Rottweter (7 80% } 


Bom (854% } 


hoe Chow (10.02% ) 


‘Ameren PA Bu Tee | 24 88% } 


StaRoraarre Bu Tree (11 4% 


German Shepherd 009 47.6%) 


pier 


Boage (31.1%) 


‘Arseican Pi Bul emir ( 17.8% ) 


Amerie Pe us Tener ( 26.0% ) 


Fig. S33. Dogs included in the MuttMix survey (page 1/4). Front and side photographs 
and descriptions of the 30 mutts (black border) and single undeclared purebred (red). All 
photos and information were taken and provided by the dog owners for use in the public 


survey and research project. 
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1019: “Hershey” 
“Brown nose, coat and eyes. Ears can be upright, 
floppy or pinned. Fur is soft except for a wirey 
‘stripe down his back.” 


Traits: 

Size = knee-high White = no 

Ears = up CoatLength = short 
Colors = 1 CoatType = straight 


1020: “Hopper” 
“Purple spotted tongue, bluish skin on belly.” 


White = urajiro 
CoatLength = long 
CoatType = straight 


1021: “Jack” 


No additional information. 


White = minimal 
CoatLength = short 
CoatType = straight 


1022: “Jack” 
No additional information. 


Size = knee-high White = minimal 
Ears = up CoatLength = medium 
CoatType = straight 


1023: “Kaylee” 


No additional information. 


Traits: 

Size = knee-high White = piebald 
Ears = down CoatLength = short 
Colors = 3 CoatType = straight 


1024: “Lily” 
“Purple tongue and coarse fur except for around 


ack Russ Torre (6.77%) 


White = no 
CoatLength = medium 
CoatType = straight 


1025: “Lola” 
“Black spots skin and very soft silky hair. Lola does 
Not have double dentition.” 


White = piebald Sue rs 
CoatLength = medium CO GIES” 


CoatType = wavy ” = 


1026: “Lucy” 
“Somewhat double coat. Her skin is both light 
and dark. Ears stand up when she is alert and tail 
curls up.” 


Yoskative Teever (252%) 


‘ech Ranta Teor (21.8% ) 


White = minimal 
CoatLength = short 
CoatType = straight 


Case" 
“ 


al 


1027: “Luna” 
No additional information. 


White = no 
CoatLength = medium 
CoatType = straight 


1028: “Maxine” 
“Hair not fur.” 


‘Godden Resiover (6.7% ) 
‘Austratan Cate Bog (5.1% ) 


White = no . 

Liye faite, fy 
CoatLength = long CELA GAO GE - 
CoatType = wavy a we ws. ra 


Fig. S33. Dogs included in the MuttMix survey (page 2/4). Front and side photographs 
and descriptions of the 30 mutts (black border) and single undeclared purebred (red). All 
photos and information were taken and provided by the dog owners for use in the public 


survey and research project. 
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1030: “Peso” 
“Jet black, no markings.” 


Traits: 

Size = knee-high White = no 

Ears = down CoatLength = short 
Colors = 1 CoatType = straight 


1031: “Ramy” 
“Spots on legs.” 


Traits: 

Size = knee-high White = piebald 
CoatLength = medium 
CoatType = wirey 


1032: “Reilly” 
“Thick orange stripe down his back. Long fine 
black guard hairs on his ear flaps.” 


Traits: 

‘Size = thigh-high White = urajiro 
Ears = down CoatLength = short 
Colors = 2 CoatType = straight 


1033: “Rex” 
No additional information. 


Size = thigh-high White = minimal 
Ears = up CoatLength = short 
Colors = 1 CoatType = straight 


1034: “Rosie” 
“Stripe of wavy fur down center of back. Black 
‘spots on tongue.” 


Traits: 

Size = knee-high White = no 

Ears = down CoatLength = short 
Colors = 2 CoatType = straight 


1035: “Ruby” 
“White blaze on chest, white socks." 


White = minimal 
CoatLength = short 
CoatType = straight 


1036: “Rudy” 
“White tail tip and freckles on chest.” 


White = minimal 
CoatLength = short 
CoatType = straight 


1037: “Sadie” 
No additional information. 


White = minimal 
CoatLength = short 
CoatType = straight 


1039: “Scotch” 
“Has dew claws.” 


White = no 
CoatLength = medium 
CoatType = straight 


1042: “Zandy” 
“Multilayered fur and has light black spots on 
tongue.” 


White = no 
CoatLength = long 
CoatType = wavy 


Louer 


Fig. S33. Dogs included in the MuttMix survey (page 3/4). Front and side photographs 
and descriptions of the 30 mutts (black border) and single undeclared purebred (red). All 
photos and information were taken and provided by the dog owners for use in the public 
survey and research project. 
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1040: “Sophie” 
“When hair grows out, she gets apnicot-colored 


hairs on her ears and some on her back. 


White = no 
CoatLength = medium 
CoatType = curly 


Fig. S33. Dogs included in the MuttMix survey (page 4/4). Front and side photographs 
and descriptions of the 30 mutts (black border) and single undeclared purebred (red). All 
photos and information were taken and provided by the dog owners for use in the public 


survey and research project. 
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Fig. S34. SNPs strongly associated with physical traits only are more differentiated 
between breeds than random loci. For each SNP, we calculated the maximum Fsr 
observed between the one of the ten most common breeds in the GWAS dataset 
(American pit bull terrier, Australian cattle dog, beagle, border collie, Chihuahua, 
dachshund, German shepherd dog, golden retriever, Labrador retriever, toy poodle) and 
the full dog population. Brackets with p-values indicate whether each category of 
associated SNPs has higher Fst values than randomly sampled SNPs (fourth column), 
measured using a one-sided t-test (R package ‘stats version 4.1.1). Red text is the mean 
and standard deviation for each category. Boxplot: box encompasses the 25-75% of 
values with a thick horizontal line at the median; whiskers show the range of 5-95% of 
values; points represent values outside the 5-95% range. 
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Table S1. Survey questions in Darwin's Ark. 


Survey # Source Question type Text Shorthand Range Options* # Responses Note on question type 
1 CHQLS other behavior DOG enjoys life Enjoys life quantitative Agree to disagree 7,637 
play is characterized by expression 
2 CHQLS motor pattern DOG wants to play Wants to play quantitative Agree to disagree 7,564 of motor patterns such as chase and 
grab-bite 
3 DIAS other behavior TSS USSU Boas ED AOSEEING cr quantitative Agree to disagree 7,559 
excited excited 
4 DPQ motor pattern DOG enjoys playing with toys cs playing with quantitative Agree to disagree 7,566 derivative of grab-bite motor pattern 
‘oys 
z 5 DPQ other behavior DOG gets bored in play quickly Bored in play quickly quantitative Agree to disagree 7,524 
- 
6 DPQ other behavior DOG seeks constant activity Seeks constant activity quantitative Agree to disagree 7,523 
q DIAS other behavior Be ie ! ESSER EY Beaten tees 2 quantitative Agree to disagree 7,536 
repetitive behavior repetitive behavior 
8 DIAS other behavior DOG calms doy very, quickly: alter Calms down quickly quantitative Agree to disagree 7,500 
being excited 
9 DIAS other behavior pls Maes peome eee ae nae Aggressive if frustrated quantitative Agree to disagree 7,499 
is frustrated with something 
10 DIAS other behavior DOG is not very patient Not patient quantitative | Agree to disagree 7,127 
other behavior ROG is pulses alan Gest ROERGAaCR cally uantitative Agree to disagree 
people people 
other behavior DOG is shy Shy uantitative Agree to disagree 
other behavior DOG is boisterous Boisterous uantitative Agree to disagree 
other behavior DOG does not think before HE acts nor tpinie before uantitative Agree to disagree 
acts 
2 other behavior BIS@ Eyes We Relig eter ENS Sorry when wrong uantitative Agree to disagree 
5 has done something wrong 
= DOG aggressively guards coveted 
z other behavior items (e.g., stolen item, treats, food Guards coveted items uantitative Agree to disagree 
‘=| bowl) 
° 
- motor pattern DOG howls Howls uantitative Never to always derivative of howling motor pattern 
motor pattern DOG Woo-woo barks Woo-woo barks uantitative | Never to always derivative of howling motor pattern 
other behavior DOG whines when alone Whines when alone uantitative Never to always 
: DOG whines to get attention, food, Whines for attention oy sd 
other behavior uantitative Never to always 
or toys food toys 
DOG is able to focus on a task in a Rea? é 
‘ : : peoe Focus in distracting Ries t 
other behavior distracting situation (e.g., loud or hr uantitative Agree to disagree 
situation 
busy places, around other dogs) 
other behavior DOG is playful with other dogs Playful with dogs uantitative Agree to disagree 
g other behavior When oft leash, DOG comes Comes when called uantitative Agree to disagree 
S immediately when called 
z ; . ‘ 
= other behavior DOS kes te nase squirrels, birds, Chases small animals uantitative Agree to disagree 
° or other small animals 
3 other behavior DOG moves normally Moves normally uantitative Agree to disagree 
35 other behavior DOG is as active as HE has been _ Ass active as has been uantitative Agree to disagree 
° 5 5 ; ; 
= Binecbehirion DOG shows extreme physical signs Extreme physical SigMS | antitative Auresundliviats 
i when excited when excited 
other behavior DOG eats grass Eats grass uantitative Never to always 
motor pattern DOG buries toys / bones. Buries toys uantitative | Never to always derivative of caching motor pattem 
other behavior DOG eats non-food items Eats non-food items uantitative Never to always 
other behavior DOG ignores commands Ignores commands uantitative Agree to disagree 
other Behavior DOG is quick to sneak out through Quick to sneak out uantitative Agree to disagree 
open doors, gates doors. 
other behavior DOG sleeps more, is less awake Sleeps more uantitative Agree to disagree 
other behavior poe 8 slow:to respondito Sow 0 respond to uantitative Agree to disagree 
corrections corrections 
2 
2 other behavior BOG) CEES HSS Gs UATE, ABSENT Ne uantitative Never to always 
5 or enclosures enclosures 
g Sie bhiaviog DOG gets stuck behind objects and Gets stuck behind uantitative Never to always 
= is unable to get around objects 
— : : 
& other behavior BOG ALS Eiiay CP AIES USI Avoids being patted uantitative | Never to always 


other behavior 


other behavior 


other behavior 


patted 


DOG shows barrier aggression 


DOG lays in one place all day long 


DOG damages doors, gates, or 
walls 


Shows barrier 
aggression 


Lays in one place 


Damages doors 


juantitative 


juantitative 


juantitative 


Never to always 


Never to always 


Never to always 


ie: 


Surve Source uestion type Shorthand Options* # Responses Note on question type 
y yp Pp Pp q YP 
DOG doesn't like to be approached Dislikes being 

or hugged approached or hugged 


other behavior uantitative Agree to disagree 


DOG is friendly towards unfamiliar Friendly to unfamiliar 


other behavior 
people people 


uantitative Agree to disagree 


Responds to my 
presence 


other behavior DOG responds to my presence uantitative Agree to disagree 


DOG behaves aggressively in 

response to perceived threats from Aggressive to 
people (e.g., being corered, having threatening people 
collar reached for) 


other behavior uantitative Agree to disagree 


other behavior DOG is a people person People person uantitative Agree to disagree 


DOG behaves fearfully towards Fearful to unfamiliar 
unfamiliar people people 


other behavior uantitative Agree to disagree 


DOG seems to get excited for no 
reason 


other behavior Excited for no reason uantitative Agree to disagree 


Socialization with Humans 


DOG behaves aggressively towards Aggressive to 
unfamiliar people unfamiliar people 


other behavior uantitative Agree to disagree 


DOG seeks companionship from Seeks companionship 


other behavior 
people from people 


uantitative Agree to disagree 


DOG must greet everyone who 
comes to the door 


other behavior Must greet everyone uantitative Agree to disagree 


DOG chases bicycles, joggers, and Chases bicycles or 


ther behavi 
Siaiuenaag skateboarders joggers 


uantitative Agree to disagree 


DOG works at tasks (e.g., getting 
other behavior treats out of a Kong, shredding Works at tasks uantitative Agree to disagree 
toys) until entirely finished 


DOG leaves food or objects alone —_ Leaves food or objects 


other behavior uantitative Agree to disagree 


when told to do so when told 
on DOG retrieves objects (e.g., balls, , : et : derivative of chase/grab-bite motor 
s motor pattern ‘ Retrieves objects uantitative Agree to disagree 
= toys, sticks) pattern 
= 
i= other behavior BOG Eres Ko HENGE I Git Control over responses quantitative Agree to disagree 
control over how HE responds 
other behavior DOG reacts very quickly Reacts very quickly uantitative Agree to disagree 
other behavior DOG is easy to train Easy to train uantitative Agree to disagree 
other behavior DOG can be very persistent Very persistent uantitative Agree to disagree 
motor pattern DOG points Points uantitative Agree to disagree derivative of eye-stalk motor pattern 
: ‘ , d : F py 2 potentially influenced by presence 
hysical trait related DOG d tt t Avoids gett t titat: A to d 
physical trait relat avoids getting wel voids getting we uantitative gree to disagree ip aids Ar 
other behavior DICK ines ian sy huts des iDiiaiteratsy itastia uantitative Never to always 
dropped on the floor dropped food 
other behavior DOG ! teks HIS empty:bow! arter Licks empty bowl uantitative Never to always 
inishing the food 
yA other behavior DOG licks tile / linoleum floors Licks tile floors uantitative Never to always 
2 
= ee 
fs) other behavior DOS: furnstin, creles: before Circles before pooping quantitative Never to always 
aa pooping 
< ; ; 
5 Pike hehonion DOG kicks oy scratches the ground Kicks ground after Nandi tetive Mm Ncventorel ears 
2 after defecating defecating 
= : eter 
2 physical trait related DOG lifts HIS leg to urinate Lifts leg to urinate uantitative Never to always potentially related to hip/joint 
& morphology or leg length 


other behavior DOG marks with feces Marks with feces uantitative Never to always 
other behavior DOG eats HIS own feces Eats own feces uantitative Never to always 
other behavior DOG shows submissive urination Submissive urination uantitative Never to always 
potentially related to dog size 
and/or face shape 


physical trait related © DOG drinks water quickly Drinks water quickly uantitative Never to always 


DOG behaves aggressively toward 
other dogs 

DOG willingly shares HIS toys 
with other dogs 


other behavior Aggressive to dogs uantitative Agree to disagree 


other behavior Shares toys with dogs quantitative Agree to disagree 


other behavior DOG is dominant over other dogs Dominant over dogs uantitative Agree to disagree 
other behavior DOG avoids other dogs Avoids dogs uantitative Agree to disagree 


other behavior DOG is friendly towards other dogs Friendly to dogs uantitative Agree to disagree 


other behavior DOG knows HE is a dog Knows is a dog uantitative Agree to disagree 
DOG behaves fearfully towards 
other dogs 


other behavior Fearful to dogs uantitative Agree to disagree 


DOG behaves aggressively towards 
cats 


other behavior Aggressive to cats uantitative Agree to disagree 


Socialization with Animals 


DOG is assertive or pushy with 
other behavior other dogs (e.g., ifin a home with Assertive with dogs uantitative Agree to disagree 
other dogs, when greeting) 


DOG sometimes fails to recognize Fails to recognize 
familiar people or pets familiar people 


other behavior uantitative Agree to disagree 
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Survey # Source Question type Text Shorthand Range Options* # Responses Note on question type 
DOG is very interested in and Tne ees 
81 DIAS, DPQ other behavior adapts easily to new things and new cr rey eed quantitative Agree to disagree 4,599 
places dea 
82 DPQ other behavior DOG exhibits fearful behaviors Fearful when restrained quantitative Agree to disagree 4,503 
" when HE is restrained 
& DOG behaves fearfully when 
‘'E 83 DPQ other behavior groomed (e.g., nails trimmed, Fearful when groomed quantitative Agree to disagree 4,546 
€ brushed, bathed, ears cleaned) 
2 84 DIAS other behavior DOG a8 not keen to go into new Not keen One quantitative Agree to disagree 4,494 
= situations situations 
=) 
= 85 DPQ other behavior Dos WEE peeressiveyy alt Aggressive at vet quantitative Agree to disagree 4,511 
= visits to the veterinarian 
op 86 new other behavior DOG is highly sensitive to noise —_ Sensitive to noise quantitative Agree to disagree 4,526 
= 87 DPQ other behavior Pr sho ws seers ona ncu Austin aie quantitative Agree to disagree 4,516 
is) nervous or fearful nervous 
88 DPQ other behavior DOG behaves f lly during visits Fearful at vet quantitative Agree to disagree 4,503 
to the veterinarian 
89 DIAS other behavior pales ue a sone OE Stee COM ine stat quantitative Agree to disagree 4,523 
interest in new things new 
90 new other behavior DOG is afraid of storms Afraid of storms quantitative Agree to disagree 4,187 


other behavior 


physical trait related 


DOG behaves submissively (e.g., 
rolls over, avoids eye contact, licks 
HIS lips) when greeting other dogs 


DOG rests frog style 


Submissive to dogs q 


Rests frog style 


juantitative 


juantitative 


Agree to disagree 


Never to always 


potentially related to hip 


morphology 
DOG paces up and down, walks in i 
: d ; Paces or circles or pat ; 
other behavior circles and/or wanders with no d uantitative Agree to disagree 
2, direction or purpose Ets 
oF : 5 
I sical trait related DOG pants frequently, even at rest Pants frequently uantitative Agree to disagree Potentially related to dog size 
ry and/or face shape 
Ss 
— other behavior DIGS sees Wel aly et dhe wells OP Stares blankly uantitative | Never to always 
> floor 
3 tenti ited t hi 
i--) ysical trait related DOG tilts HIS head Tilts head uantitative Never to always ee dey tae 
skull shape 
hysical trait related TIS: shines or fies Shakes occasionally uantitative Agree to disagree Pe yy aebiel wo ale Silo ais 
occasionally capacity to thermoregulate 
i Sars potentially influenced by bone & 
sical trait related DOG crosses HIS front paws Crosses front paws uantitative Never to always ae 
joint morphology 
other behavior DIOS wlitenss Mnnnalestness ses Shows handedness quantitative | Never to always 
preference 
other behavior DOG. places HIS: pay-on amy. or Places, paw. on. people's uantitative Never to always 
other people's feet feet 
other behavior DOG is lethargic Lethargic uantitative Agree to disagree 
other behavior DOG is confident Confident uantitative Agree to disagree 
STeabeh on DOG seems dull or depressed, not Seems dull or wenietnS. Newadadiwenics 
alert depressed 
- other behavior DOG is anxious Anxious uantitative Agree to disagree 
3 : : 
2 other behavior BO 2 cy Impulsive uantitative Agree to disagree 
= impulsive 
=z other behavior DOG is curious Curious uantitative Agree to disagree 
other behavior DOG is affectionate Affectionate uantitative Agree to disagree 
other behavior DOG tends to be calm Calm uantitative Agree to disagree 
other behavior DOG is aloof Aloof uantitative Agree to disagree 
other behavior ate has: more: good days than ‘bad More good days uantitative Agree to disagree 
jays 
When DOG is standing next to ankle high or shorter | calf 
physical trait physical trait someone of average height, how uantitative high | knee high | thigh high | 
high are HIS shoulders? hip high or higher 
white | red (from pale peach to 
dark red or liver colored) | 
yellow (from pale cream to 
: r : ; What color is DOG? Select all that multiple orange, gold, fawn, or . ' 
physical trait physical trait Fur color : pigmentation 
apply. choice wheaten) | gray (slate, blue 
gray, charchoal, or silver) | 
Z chocolate brown | pure black | 
‘ merle | brindle 
= How much white fur does DOG 
2 hysical trait physical have? Select the image with the White fur quantitative image options (see figure S3A) pigmentation 
2 closest amount of white. 
Aa hysical trait physical Is DOG's tail curly? Curly tail binary no | yes | I don't know tail shape 


hysical trait 


hysical trait 


hysical trait 


hysical trait 


physical 


physical 


physical 


physical 


What is DOG's ear shape? Select 
the image with the closest ear 
shape. 

Are DOG's eyes different colors? 
How long is DOG's fur on HIS 
back and sides? Measure it against 
your finger. 

Does DOG have soft fur or rough 
and bristly fur? 


Ear shape 


Eyes different colors 


Fur length 


Fur texture 


* Agree to Disagree: Strongly Agree | Agree | Neither Agree Nor Disagree | Disagree | Strongly Disagree 


categorical 


binary 


quantitative 


binary 


image options (see figure S3B) 


no | yes | I don't know 


short | medium | long | I'm not 
sure 


soft | rough (wiry) | I'm not 
sure 


ear shape 


pigmentation 


fur type 


fur type 
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Table S2. Breed information, stereotypes, and reference panel. 


Breed Reference Panel Encyclopedia of Dog Breeds 


AKC % candidate 
Registrations purebred dogs 
Breed (2000-2015) | (Darwin's Ark) = AKC Group _ AKC Three-word Descriptors 


labrador retriever 15.18% 10.29% sporting active, friendly, outgoing 


german shepherd dog 5.95% 486% herding confident, courageous, smart 


golden retriever devoted, friendly, intelligent 


Sources 


exercise requirements 
ae 

affection level 
friendliness towards 
friendliness towards 
friendliness towards 
ease of training 

* | watchdog ability 

‘ | protection ability 


energy level 


© | Low pass + imputation 


© |Tllumina array 


© | Axiom array 


a 
an 
an 


chihuahua 4 nar charming, graceful, sassy 


american pit bull terrier 
border collie : ‘ : i affectionate, energetic, smart 


australian shepherd : | ie i exuberant, smart, work-oriented 


beagle : 3 ; curious, friendly, merry 


curious, friendly, spunky 
non-sporting active, proud, very smart 


jack russell terrier i . als i alert, inquisitive, lively 


australian cattle dog : { a es i alert, curious, pleasant 


boxer i : Falls i active, bright, fun-loving 
siberian husky R #1; i loyal, mischievous, outgoing 


shih tzu A ; affectionate, outgoing, playful 


yorkshire terrier i ! ‘ affectionate, sprightly, tomboyish 


toy poodle ‘ ; agile, intelligent, self-confident 
charming, loving, mischievous 


miniature schnauzer 5 ‘ 5 i friendly, obedient, smart 


boston terrier - a 31; non-sporting amusing, bright, friendly 


pomeranian i j ale bold, inquisitive, lively 
pembroke welsh corgi 


doberman pinscher ! i lt alert, fearless, loyal 


shetland sheepdog i e135 i bright, energetic, playful 


cavalier king charles spaniel 0.86% h ; affectionate, gentle, graceful 
rottweiler . z 31; i confident guardian, loving, loyal 


greyhound i U - gentle, independent, noble 


great pyrenees : A 31; i calm, patient, smart 


collie devoted, graceful, proud 


maltese Des) charming, gentle, playful 


staffordshire bull terrier ‘ ‘ 0 terrier brave, clever, tenacious 


shiba inu 
basset hound 


german shorthaired pointer 1.49% 


american cocker spaniel 1.74% 


papillon 0.56% 


french bulldog 


havanese 


belgian malinois 


brittany 


bernese mountain dog 


english bulldog 


4 


thodesian ridgeback 0.31% 


west highland white terrier 0.74% 


whippet 0.22% 
english springer spaniel 


cairn terrier 


mastiff 


lhasa apso 


alaskan malamute 


airedale terrier 0.29% 0.31% 8 0 i) 3 33134 


AKC 
Registrations 
(2000-2015) 


% candidate 
purebred dogs 
(Darwin's Ark) 


Breed Reference Panel 


Sources 


Illumina array 
Low pass + imputation 


AKC Group 


AKC Three-word Descriptors 


exercise requirements 


playfulness 


affection level 


friendliness towards 


friendliness towards 


friendliness towards 


ease of training 


watchdog ability 


protection ability 


0.38% 


0.31% 


weimaraner 0.85% 
chow chow 


dalmatian 


0.31% 


portuguese water dog 


newfoundland 
italian greyhound 


saint bernard 


border terrier 


nova scotia duck tolling 


bullmastiff 


english setter 


soft coated wheaten terrier 


english cocker spaniel 


samoyed 


shar pei 


tibetan terrier 
chinese crested 


basenji 


scottish terrier 


irish setter 


bull terrier 


chesapeake bay retriever 


irish wolfhound 
tibetan spaniel 


pekingese 0.44% 


borzoi 0.09% 


greater swiss mountain dog 0.09% 


saluki 


bearded collie 


afghan hound 
norwegian elkhound 


schipperke 


wire fox terrier 


old english sheepdog 
belgian tervuren 


chinook NA 


wirehaired pointing griffon 0.07% 


tibetan mastiff NA 
gordon setter 


norfolk terrier 


finnish spitz 


entlebucher 


Sources: 


1 NHGRI Dog Genome Project (Elaine Ostrander) (40) 


2 Broad Institute (BioProject PRJNA683923) 


3 Cornell Canine Dataset (Hayward et al 2016, doi:10.1038/ncomms 10460) (//8) 


4 Darwin's Ark (BioProject PRJNA675863) 


5 BarkBase (Megquier et al 2019, doi: 10.3390/genes 10060433) (102) 


6 Embark Veterinary, Inc. (Darwin's Ark participant-submitted raw data) 
7 National Entlebucher Mountain Dog Association (BioProject PRJINA683923) 


76 


Table $3. Descriptive statistics for the exploratory factor analysis (110 items, 10,252 dogs). 


6 


7 


8 


9 


Minimum 

Maximum 

25% Percentile 

50% Percentile 

75% Percentile 

Sum of Squared Loading 
Proportion Variance 
Cumulative Variance 


Minimum 

Maximum 

25% Percentile 

50% Percentile 

75% Percentile 

Sum of Squared Loading 


Proportion Variance 


Cumulative Variance 


0 
0.92 
6.2 
-3.64 
2.56 
-0.56 
0.13 
0.68 
2.98 
0.03 
0.18 


0.75 
7.64 
-3.57 
4.07 
-0.5 
-0.01 
0.48 
1.44 
0.01 
0.36 


0 
0.84 
7.24 

-2.48 
4.76 
-0.59 
-0.06 
0.53 
4.01 
0.04 
0.22 


0.73 
6.75 
-3.77 
2.98 
-0.49 
-0.01 
0.47 
1.42 
0.01 
0.38 


0 
0.87 
7.18 
-2.5 
4.68 
-0.6 
-0.14 
0.49 
2.42 
0.02 
0.24 


0.72 
6.04 
-3.07 
2a 
-0.48 
-0.01 
0.48 
1.05 
0.01 
0.39 


0 
0.82 
6.81 

-3.14 
3.67 
-0.55 
0.01 
0.56 
2.54 
0.02 
0.27 


0.71 
7.22 
-2.46 
4.76 
-0.49 
-0.03 
0.46 
1.54 
0.01 
0.4 
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Table $4. Factors discovered from Darwin's Ark survey data. 


Pattern Structure Question Scale 


Factor 1: Human 
Sociability 


{- less sociable - | + highly sociable + } 


Fearful to unfamiliar people Strongly agree to strongly disagree 
DOG is friendly towards unfamiliar people Strongly agree to strongly disagree 
DOG is a people person Strongly agree to strongly disagree 
DOG is shy Strongly agree to strongly disagree 
DOG doesn't like to be approached or hugged Strongly agree to strongly disagree 
DOG behaves aggressively towards unfamiliar people Strongly agree to strongly disagree 
DOG must greet everyone who comes to the door Strongly agree to strongly disagree 
DOG is not keen to go into new situations Strongly agree to strongly disagree 
DOG is very interested in and adapts easily to new Strongly agree to strongly disagree 
things and new places 


DOG seeks companionship from people Strongly agree to strongly disagree 


DOG is confident Strongly agree to strongly disagree 


DOG behaves aggressively in response to perceived Strongly agree to strongly disagree 

threats from people (e.g., being cornered, having collar 

reached for) 
37. DOG walks away or avoids being patted Never to always 
88 DOG behaves fearfully during visits to the veterinarian Strongly agree to strongly disagree 
104 DOG is anxious Strongly agree to strongly disagree 
11 DOG is relaxed when greeting people Strongly agree to strongly disagree 
82. DOG exhibits fearful behaviors when HE is restrained Strongly agree to strongly disagree 


{ - aroused - | + composed + } 


DOG tends to be calm Strongly agree to strongly disagree 
DOG is considered to be very impulsive Strongly agree to strongly disagree 
DOG does not think before HE acts Strongly agree to strongly disagree 
DOG calms down very quickly after being excited Strongly agree to strongly disagree 
DOG seems to get excited for no reason Strongly agree to strongly disagree 
DOG is boisterous Strongly agree to strongly disagree 
DOG is not very patient Strongly agree to strongly disagree 
DOG seeks constant activity Strongly agree to strongly disagree 
DOG appears to have a lot of control over how HE Strongly agree to strongly disagree 
responds 
DOG shows extreme physical signs when excited Strongly agree to strongly disagree 
DOG is relaxed when greeting people Strongly agree to strongly disagree 
Excitement can lead DOG to fixed repetitive behavior Strongly agree to strongly disagree 
104 DOG is anxious Strongly agree to strongly disagree 
DOG paces up and down, walks in circles and/or Strongly agree to strongly disagree 
wanders with no direction or purpose 
DOG is able to focus on a task in a distracting situation Strongly agree to strongly disagree 
(e.g., loud or busy places, around other dogs) 


DOG becomes aggressive when excited Strongly agree to strongly disagree 


Pattern Structure Question Scale 
Factor 3: Toy-directed 
Motor Patterns 


{- MPs toy-directed - | + MPs not toy-directed + } 


0.943 0.773 DOG enjoys playing with toys Strongly agree to strongly disagree 
0.751 0.721 DOG wants to play Strongly agree to strongly disagree 
0.719 0.651 DOG retrieves objects (e.g., balls, toys, sticks) Strongly agree to strongly disagree 
-0.554 -0.544 DOG gets bored in play quickly Strongly agree to strongly disagree 
NA 0.374 6 DOG seeks constant activity Strongly agree to strongly disagree 
NA -0.317 DOG sleeps more, is less awake Strongly agree to strongly disagree 
NA -0.316 DOG is lethargic Strongly agree to strongly disagree 
NA 0.308 DOG enjoys life Strongly agree to strongly disagree 
NA 0.305 DOG takes a long time to lose interest in new things Strongly agree to strongly disagree 


Factor 4: Biddability {- biddable - | + independent + } 


-0.778 -0.707 DOG ignores commands Strongly agree to strongly disagree 
0.675 0.639 When off leash, DOG comes immediately when called Strongly agree to strongly disagree 
0.613 0.622 DOG is easy to train Strongly agree to strongly disagree 
-0.612 -0.597 DOG is slow to respond to corrections Strongly agree to strongly disagree 
0.463 0.508 DOG leaves food or objects alone when told to do so Strongly agree to strongly disagree 
-0.467 -0.485 DOG is quick to sneak out through open doors, gates Strongly agree to strongly disagree 
0.360 0.422 DOG is able to focus on a task in a distracting situation Strongly agree to strongly disagree 
(e.g., loud or busy places, around other dogs) 

NA 0.315 DOG appears to have a lot of control over how HE Strongly agree to strongly disagree 
responds 


Factor 5: Agonistic Jemeuintione (2a 
Threshold 

DOG may become aggressive if HE is frustrated with Strongly agree to strongly disagree 

something 

DOG shows aggression when nervous or fearful Strongly agree to strongly disagree 

DOG becomes aggressive when excited Strongly agree to strongly disagree 

DOG behaves aggressively in response to perceived Strongly agree to strongly disagree 

threats from people (e.g., being cornered, having collar 

reached for) 

0.497 DOG behaves aggressively towards unfamiliar people Strongly agree to strongly disagree 

0.454 DOG aggressively guards coveted items (e.g., stolen Strongly agree to strongly disagree 

item, treats, food bowl) 

0.428 DOG behaves aggressively during visits to the Strongly agree to strongly disagree 

veterinarian 


NA 0.382 DOG behaves aggressively toward other dogs Strongly agree to strongly disagree 
-0.383 -0.373 DOG shows barrier aggression Never to always 


Factor 6: Dog Sociability {- less sociable - | + highly sociable + } 


-0.914 -0.792 DOG is friendly towards other dogs y agree to strongly disagree 
-0.847 -0.742 DOG is playful with other dogs y agree to strongly disagree 
0.724 0.654 DOG avoids other dogs y agree to strongly disagree 
0.591 0.563 DOG behaves aggressively toward other dogs y agree to strongly disagree 
0.513 0.521 DOG behaves fearfully towards other dogs y agree to strongly disagree 


-0.318 -0.337 DOG willingly shares HIS toys with other dogs y agree to strongly disagree 


Pattern Structure 


Question 


Factor 7: Environmental 


Engagement 


{- high engagement - | + low engagement + } 


-0.734 
-0.692 
0.490 
0.456 
0.423 


0.480 
0.493 
0.407 
0.342 
-0.336 


-0.486 
-0.347 
0.310 


-0.467 
-0.461 
0.427 
0.403 
0.373 


0.369 
0.345 
0.342 
0.335 
-0.320 


-0.315 
NA 
NA 


Factor 8: Proximity 
Seeking 


DOG is lethargic 

DOG seems dull or depressed, not alert 

DOG reacts very quickly 

DOG has difficulty finding food dropped on the floor 
DOG gets stuck behind objects and is unable to get 
around 

DOG stares blankly at the walls or floor 

DOG lays in one place all day long 

DOG can be very persistent 

DOG is curious 

DOG paces up and down, walks in circles and/or 
wanders with no direction or purpose 

DOG sleeps more, is less awake 

DOG pants frequently, even at rest 

DOG moves normally 


{ - affectionate - | + aloof +} 


DOG is affectionate 

DOG seeks companionship from people 
DOG walks away or avoids being patted 
DOG is aloof 

DOG is a people person 

DOG doesn't like to be approached or hugged 
DOG responds to my presence 


Strongly agree to strongly disagree 
Strongly agree to strongly disagree 
Strongly agree to strongly disagree 
Never to always 
Never to always 


Never to always 
Never to always 
Strongly agree to strongly disagree 
Strongly agree to strongly disagree 
Strongly agree to strongly disagree 


Strongly agree to strongly disagree 
Strongly agree to strongly disagree 
Strongly agree to strongly disagree 


Strongly agree to strongly disagree 
Strongly agree to strongly disagree 
Never to always 
Strongly agree to strongly disagree 
Strongly agree to strongly disagree 


Strongly agree to strongly disagree 


Strongly agree to strongly disagree 
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Table $5. Biallelic SNPs unique to sampled breed populations. 


Population random N all SNPs autosomal x 


mutt 375,474 361,663 13,811 


So 


belgian tervuren 
border collie 
entlebucher 

german shepherd dog 
golden retriever 
greyhound 

labrador retriever 
leonberger 
portuguese water dog 
rottweiler 

tibetan mastiff 

west highland white terrier 
yorkshire terrier 


0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 


19,018 
51,279 
27,627 
21,211 
35,055 
46,073 
43,346 
36,611 
40,372 
39,021 


651,551 


30,757 
66,791 


18,165 
48,421 
26,465 
19,127 
33,513 
43,535 
41,462 
34,972 
39,328 
37,609 


629,712 


29,579 
62,703 


breed mean: 
breed SD: 


85,286 
170,611 


101,875 
165,032 
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Table S6. Concordance of owner-reported breed with genetically inferred breed ancestry. 
Top breed at Breed matches Breed matches 


Classification Group Owner report Reference breed _N 85%+ top ances at 85%+ 
preliminary non-purebred 0 or2 breeds either 1186 38 402 20 
Sy L. present 880 517 752 S511 
preliminary single breed 1 breed 
absent 89 12 0 0 
Wes registered 1 breed & registered present 304 22) 299 271 
preliminary 
purebred purebred absent 19 D} 0 0 
9 
final mutt Dior 2 Breeds; ObA5/0.— ittier 1221 0 382 0 
breed ancestry 
, either 934 584 gD, 541 
final CEES 1 breed present 814 541 772 531 
purebred 
absent 120 43 0 0 
1 breed reported & either 633 584 559 531 
final contmed registered purebred,or present 573 541 559 531 
purebred 


85%+ breed ancestry absent 60 43 0 0 


Table $7. The correlation of population peculiarity scores among traits. 


Confirmed Purebreds Candidate Purebreds 


Comparison N r Pp 5% Cl 95% CI N r p 5% Cl 95% CI 
behavioral vs. physical 3,790 0.045 0.0059 0.013 0.076 46,202 0.064 1.20E-43 0.055 0.073 


behavioral trait questions 27,137 0.055 2.30E-19 0.043 0.066 353,925 0.137 0 0.134 0.141 
physical trait questions 120 -0.101 0.275 -0.275 0.08 1,296 0.018 0.528 -0.037 0.072 
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Table S8. Validation of breed ancestry effects against PPS and breed standards. 


Breed Standard Trait _# breeds* % correctly predicted Counts (by predicted effect) Discordant Breed(s) 
9 ankle high; 8 hip high; 1 other 


3 all white 
14 dropped; 8 pricked jack russell terrier 
6 long; 3 short 


* breeds that are significant in LMER analysis 
** bull terrier excluded because it comes in two sizes (miniature and standard), and LMER model predicted bull terrier ancestry contributes 
to smaller size. 


84 


Table S9. Breed propensities inferred only from the effects of breed ancestry in mutts. 


Index 


127 


127 


127 


Class 
behavior 
question 
behavior 
factor 
behavior 
question 
behavior 
factor 
behavior 
question 
behavior 
question 
behavior 
question 
physical 
trait 
behavior 
question 
physical 
trait 
behavior 
question 
behavior 
question 
behavior 
question 
physical 
trait 
physical 
trait 
physical 
trait 


Trait 


Q110: More good days 

F3: Toy-directed Motor Patterns 
Q40: Damages doors 

F8: Proximity Seeking 

Q52: Works at tasks 

Q40: Damages doors 

Q2: Wants to play 

Q125: Ear shape 

Q17: Howls 

Q121: Size 

Q31: Ignores commands 
Q35: Escapes from enclosures 
Q26: As active as has been 
Q127: Fur length 

Q127: Fur length 


Q127: Fur length 


Breed 
akita 


shar pei 

chesapeake bay retriever 
saint bernard 

shar pei 

shar pei 

shar pei 

bloodhound 
bloodhound 

mastiff 

shar pei 

chesapeake bay retriever 
english cocker spaniel 
old english sheepdog 
samoyed 


irish setter 


Phenotype 
Direction 


disagree 
not toy-directed 
always 
affectionate 
disagree 
always 
disagree 
dropped 
always 

hip high 
agree 
always 
disagree 
long 

long 


long 


N 
24 


93 


17 


54 


28 


27 


28 


62 


63 


10 


27 


17 


25 


29 


28 


18 


REML t 
4.77 


4.57 


4.15 


-4.11 


3.89 


3.66 


3.78 


-3.04 


3.60 


2.82 


-3.23 


3.46 


3.20 


2.90 


ZIP 


2.64 


ML anova p-value 
(FDR adjusted) 


6.88E-05 

1.61E-04 
0.0011 
0.0016 
0.0039 
0.0059 
0.0064 
0.0100 
0.0109 
0.0160 
0.0204 
0.0233 
0.0319 
0.0374 
0.0462 


0.0462 
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Table S10. Replication of physical trait associations found in literature for dogs and other animals. 


Published Published Published GWAS GWAS Replicated 
Gene _ Species Trait Published Associaton p-value _ effect size DOI / Reference GWAS Trait GWAS Peak p-value effectsize canine locus? 


chondro- 10.1126/ . 
‘ 18:2031083-20742377 1.90E-111 N/A . Q121 Size 18:20423056:A:G 1.20E-26 -0.39 TRUE 
dysplasia science.1173275 (67) 


retrogene 
insertion to 


Canis lupus 
fam i 


chondro- 10.1073/ 


dysplasia 12:32413668-43115518  <5.00E-02 N/A pnas.1709082114 (70) Q121 Size 12:33887652:G:A 2.18E-08 -0.14 TRUE 


retrogene 
insertion to 
Canis lupus 

familiaris 


ss 
> S stature 10:8454499 7.06E-09 N/A 10.1101/gr.157339.113 (69) 
3 
&& 
QI21 Size 10:8356059:G:T 1.80E-24 -0.31 TRUE 
5 & stature 10.1038/ng2121 (138) 
; stature 7:43865905 1.05E-06 N/A QI21 Size 7:43868045:C:T 3.82E-09 -0.14 TRUE 
gs 
Ee 
25 9.10E-09 
SS stature 4:39200720,4:67026055 y’san y7 NIA 10.1101/gr.157339.113 (69)|  Q121 Size 4:67040898:C:T 2.80E-10 -0.15 TRUE 
25 . 
és 
ae 
38 3:91269525 4.04B-23 N/A Dae 
aires, PINUS : ; $41467-019-09373-w (40) 
g& 
: 10.1371 
Stare journal.pgen.10004 09 
(139) 
; 10.1371 
= SALUT journal.pone.00564 97 
3 (140) 
s QI21 Size 3:91108467:T:C 2.00E-18 0.29 TRUE 
: 10.111 1/a 
S ature 10.111 1/age. 
(141) 
; 10.1534 
Se genetics.110.12394 3 (142) 
; 10,1038 
s(alure $41588-018-0056-5 (143) 
15:40446920-40447659 5.90 E-05 N/A tens 
= Sea ; f OE pnas. 152333099 (64) 
= 
S 
5 
3 
2 stature 15:34903589-34903860 2.80E-02 N/A 10.1101/gr.3712705 (65) QI21 Size 15:41274602:G:T 2.00E-17 0.18 TRUE 
Q 
fe 
<7 
0 
15:41221438 2.00E-16 N/A meee 
satus ; ; science.1137045 (66) 
2 & 
a = 2 
a 8 34:20097018-212633271 126-11 N/ ei 121 Si 34:18296868:C:T 4.10E-09 0.12 TRUE 
& Siar : s41467-019-09373-w (40) Sieh : ee 
10.1186/1750-1172-6-7 
metabolism 8 (72) 
Ql21 Size ——-:11:22737397:G:A  1.30E-13 0.32 FALSE 
10.1186/1750-1172-6-7 
metabolism 8 (72) 
a 2 2 
5 38 26:12796099-13004170 2.08E-11 ot M38! 121 Si 26:12838979:C:A 2.90E-37 -0.78 TRUE 
aa a 4g Statute ; . eal s41467-019-09373-w (40) Qi bee ; a : ~ 
= S & 
$ 
; 10.1016 ; 
dwarfism QI21 Size 17:36295546:C:T 4.10E-08 0.15 FALSE 


j-ajhg.2019.06.011 (74) 


Published 


Published Published 


GWAS GWAS 


Replicated 


Gene __ Species Trait Published Associaton p-value __ effect size DOI / Reference GWAS Trait GWAS Peak p-value effectsize canine locus? 
8 3.6E-25, 10.1007/s00335-012-9417- 
3 stature 3:41849479 1. 4E-38 2 (68) 
3 
stature 10.1159/000437324 (144) |Q121Size(Tiny) 3:42107672:A:G = 1.30E-13 0.07 TRUE 
metabolism 10.1172/JC142447 (145) 
BMS 10.1371 
predated Ponenel eens 00006a( 7a) Q122 Merle coat 10:2677840:A:G  3.10E-08 -0.16 FALSE 
body size 
’ 1.3389, Q121 Size 
birth weight ne.2020.00588 (146) Giant) 30:32302089:C:T 7.80E-13 0.09 FALSE 
white . 
Fi 20:21836232-21836429 10.1093/jhered/esp029 (147 
spotting 
5 white 
3 2 20:21839331-21839366 10.1038/ng.2007.10 (14) 
3 spotting 
& 
& 
2 1 20:21786368-21869849 5.99E-29 pee 
& coat color ; TEsee is 41467-019-09373-w (40) 
Q123 White fur 20:21827323:C:T 2.90E-37 -0.78 TRUE 
white 10.1371/ 
spotting journal.pone.01043 63 
(148) 
white 10.1111/age.12751 (149) 
spotting 
; 10.1080/00071668.2017.137 
pigmentation 9053 (150) 
ere nm 
aan : ie > 45 $12864-015-1702-2 (7/) 
10:8090498 
ear shape Q125 Ear shape 10:8027948:C:T 1.40E-27 0.10 TRUE 


10.1186 
s12711-018-0442-6 (152) 


ear shape 


Published 


Published Published 


GWAS GWAS 


Replicated 


Gene _ Species Trait Published Associaton p-value __ effect size DOI / Reference GWAS Trait GWAS Peak p-value _effectsize canine locus? 
, 10.1371 
Me journal.pone.01020 85 
2 (153) 
a 
a Q125 Ear shape 10:8027948:C:T  6.00E-23 -0.33 FALSE 
oH 
Sreha 10.1016/S2095- 
ear shape 
a 3119(15)61173-X (154) 
1 h 32:4509367 3.08E-66 N/A apes 
coat lengt : : j.1365-2052.2006.0144 
8.x (155) 
2 
s 
& 32:4528617-4528633, 
a coat length 32:4528621-4528621, NA NA 10.1111/age.12010 (156) 
= 32:4528639-4528639 
i g 
fo] & Q127 Furlength  32:4509367:G:T  5.50E-54 0.37 TRUE 
a 
1 h 32:7473337 1.00E-157 N/A eae 
soatlengt : i science.1177808 (58) 
$ 1 k 10.1111 
SED j.1365-2052.2007.0159 
0.x (157) 
2 ae ishi 13:8568727-8694401 4.00E-292 N/A eth ee 
S: seoan amen ; ‘ ; science.1177808 (58) 
S ngs 
a iS 
s) & 
Bs 4 Q128 Furtexture 13:8491477:A:C 3.40E-13 0.27 TRUE 
mA = 
= "Improper z 
S : 
& coat" in PWD 13:8610419 NA NA 10.1093/jhered/esq068 (158 
3 Bp i 38:11122467 and 
tick 1.70E-05 N/A 10.1111/age.13040 (59) 
go Nes 38:11124294 nee ©) 
= 
< 8 Lt 
aa 122 Tick 
g 2 Q122 Ticking 39-11165134:G:A 531E-16 0.20 TRUE 
gZ & (piebald-only) 
: icki 38:11085443 3.6E-08 NA eae 
1) Hesine : ca journal.pone.02482 33 
(159) 
brindl 16:43818953-57246829 N/A NA BEE 
ae. RF : genetics.107.07423 7 (63) 
g 
s 
Sig 
= SS x 10.1126/ 122 Brindle 
a S$ brindle 16:58965448-58965450 < L00R NA Q 16:59013740:G:A ##HHHHH 0.43 TRUE 
a a 06 science. 1147880 (61) coat 
<7 
iS 
brindle 16:58965448-58965450 N/A NA 10.1292/jvms. 10-0439 (62) 
= it 10:292851 eet 
< cs ? pnas.0506940103 (160) 
is) 
8 
gs & 
a & 
8 a merle 10.1159/000491408 (/6/) Q122 Merle coat 10:371299:T:G 6.40E-09 0.12 TRUE 
Q 
2 - 
B 3 
a 8 
= 
& 
3 
A merle 10.1556/004.2019.018 (162) 
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Published Published Published GWAS GWAS Replicated 
Gene _ Species Trait Published Associaton p-value __ effect size DOI / Reference GWAS Trait GWAS Peak p-value effectsize canine locus? 
S vitiligo 10.1038/ng.602 (163) 
& = 122 Ticki 
K Q122 Ticking 40:20888915:A:G 3.10E-09 0.23 FALSE 
2 (piebald-only) 
hair loss 10.1242/dev.097477 (164) 
2 
38 di i 2:74746906 1.48E-154 0.95 ae 
Site, Te nten ey : : ; journal.pone.02505 79 (60) 
fe 
z Qe Red 2:74851797:T:C _2.00E-08 0.12 TRUE 
xz intensity 
Si 10.3892/etm.2019.7663 
a gray hairs ie 
Ss ra ey (165) 
0.4 Mb from 11:34516748- 10.1186/1746-6148-1-1 
li N/A NA 
Se 34517020 (166) 
g 
= 
& 
S 
3 liver 11:33326719-33326719 1.08E-19 N/A 10.111 1/age.12839 (167) 
Q 
x 2 
= & Q122 Livercoat 11:33326685:C:T 5.30E-16 0.20 TRUE 
liver 11:33317810 NA NA 10.1111/age.12337 (168) 


liver 


10.1007 
8003350020017 (169) 
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Table S11. Correlation of age-related questions in Darwin’s Ark and the Canine Cognitive Dysfunction Rating scale. 
Spearman correlation of "% dogs that frequently 
Pearson correlation of PPS with year of age (8 yr +) | perform behavior" with age 


Darwin's Ark: Salvin et al 2011 (35): 
# Question from 8-10 10-12 >12 Direction 

Question years or P CCDR years years years Correlation | of score Match? 
ia Wee Weis Hoey 7 0k iene, [Pest 96% 95% 88% -l opposite TRUE 
(agree-disagree) playing 
ee more oneal oa. ord. n0e ose ee 13% 21% — 39% 1 {same TRUE 
(agree-disagree) locomotion 
#36 DOG gets stuck behind 
objects and is unable to get 7 077 0.044 Gets stuck behind 10% 11% 22% 1 ae TRUE 
around objects 


(never-always) 

#39 DOG lays in one place 
all day long 7 0.97 4.20E-04 |Time spent active 52% 44% 29% -l opposite TRUE 
(never-always) 


#61 DOG has difficulty 
finding food dropped on the 7 0.97 3.20B-04 Difficulty finding 
floor dropped food 
(never-always) 

#95 DOG stares blankly at 
the walls or floor 6 0.77 0.072 |Stares blankly 10% 11% 22% 1 same TRUE 
(never-always) 


21% 26% 51% 1 same TRUE 


Table $12. Correlation of age-related questions in Darwin’s Ark and Canine Health-related and Quality 


of Life Survey. 

Darwin's Ark: Lavan et al 

Pearson correlation of PPS with year of age 2013 (36) 

Question D r p* 

Q1 DOG enjoys life 

(agree-disagree) 4.60E-07 0.93 0.063 

Q2 DOG wants to play 

(agree-disagree) 2.51E-09 0.97 0.0001 

Q25 DOG moves normally 

(agree-disagree) 1.35E-07 0.94 0.0001 

Q26 DOG is as active as HE has been 

(agree-disagree) 4.74E-09 0.97 0.0001 

Q33 DOG sleeps more, is less awake 

(agree-disagree) 4.20E-10 -0.98 0.0001 

Q39 DOG lays in one place all day long 

(never-always) 6.46E-07 0.93 0.024 

Q43 DOG responds to my presence 

(agree-disagree) 8.43E-04 0.77 0.007 

Q94 DOG pants frequently, even at rest 

(agree-disagree) 3.71E-05 -0.86 0.016 

Q97 DOG shakes or trembles occasionally 

(agree-disagree) 2.69E-07 -0.95 0.135 

Q103 DOG seems dull or depressed, not alert 

(agree-disagree) 3.50E-09 -0.97 0.002 

Q110 DOG has more good days than bad days 

agree-disagree 4.42E-05 0.87 0.033 
Spearman's rank correlation rho= 0.633 

Spearman's rank correlation p= 0.037 


* p from Kruskal—Wallis one-way ANOVA; this is the only statistical metric reported in Lavan et al 2013 
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Table $13. Set of SNPs with MAF>0.025 in each population randomly sampled. 


Population 
wolves 
village dogs 
mutts 
yorkshire terriers 
labrador retrievers 
golden retrievers 
leonbergers 


All 
18,216,839 
12,710,126 
12,491,157 

8,983,416 
8,407,324 
8,348,982 
7,135,029 


Illumina array SNPs 
133,193 
152,094 
160,666 
141,619 
137,550 
136,917 
120,717 


Axiom array SNPs 
678,689 
758,076 
779,353 
646,509 
619,205 
615,892 
523,873 


Ss 


Low-coverage 
uencing SNPs 
7,944,474 
8,719,223 
9,076,003 
7,140,504 
6,806,734 
6,760,319 
5,689,553 
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Table S14. Population peculiarity score datasets. Variable numbers of dogs available for random sampling has a small 


effect on the magnitude of PPS z scores in the candidate breed analysis. 
# dogs available for | Effect of # dogs available for sampling on abs(z) scores 


median # sets # dogs per sampling per set (ANOVA) 
Set type # tests /trait(range) sample (range) ges DFd F p 
year of age 1,375 11 100 639 - 1814 1.00E-03 1,249 G25) 0.26 
candidate 7,478 60 (51 - 62) 25 25 - 905 0.028 7,356 208 1.50E-46 
purebred 4,708 37 (28 - 44) 25 50 - 905 0.013 4,587 59.2 1.70E-14 
eouiitannes! 604 5-6) 50 100 - 316 0.01 498 4.8 0.028 
purebred 


ANOVA model: anova_test(abs(z) ~ ndogs_available+trait_id) 
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Table S15. Expected vs. observed rates of correct guesses in Mutt Mix. 


User guesses 2 breeds User guesses 3 breeds 
1+ correct 2 correct 1+ correct 2+ correct 3 correct 
ID Name # guesses |exp obs exp [obs exp [obs exp [obs exp obs 

1009 Bailey 14,160 0.178 0.959 0.00541 0.16695 0.26 0.957 0.01606 0.31511 1.60E-04 1.30E-02 
1010 Bella 14,117 0.169 0.881 0.00505 0.02006 0.247 0.892 0.01505 0.24285 1.50E-04 4.00E-03 
1011 Beskow 13,696 0.281 0.194 0.00949 0.00494 0.395 0.316 0.02711 0.01761 9.20E-05 0.00E+00 
1012 Boone 13,975 0.167 0.872 0.00429 0.23839 0.244 0.922 0.01281 0.44669 7.50E-05 3.80E-03 
1013 Buddy 14,060 0.197 0.975 0.00681 0.02922 0.286 0.964 0.0201 0.06796 2.20E-04 4.80E-04 
1014 Clarence 13,969 0.362 0.726 0.02428 0.18354 0.496 0.865 0.06671 0.41076 1.20E-03 6.00E-02 
1015 Cooper 13,903 0.232 0.949 0.00864 0.37128 0.333 0.941 0.02519 0.53381 2.10E-04 9.40E-02 
1016 Dug 13,662 0.351 0.774 0.02281 0.03458 0.483 0.79 0.06298 0.07397 1.30E-03 3.60E-04 
1017 Esme 13,737 0.329 0.729 0.01835 0.12847 0.455 0.798 0.05122 0.24978 9.00E-04 2.50E-03 
1019 Hershey 13,944 0.231 0.805 0.00968 0.20193 0.332 0.851 0.02821 0.37112 3.90E-04 4.30E-02 
1020 Hopper 14,044 0.254 0.604 0.01171 0.00517 0.362 0.668 0.03381 0.01203 5.10E-04 0.00E+00 
1021 Jackl 14,041 0.1 0.211 0.00165 0 0.149 0.351 0.00503 0.00795 2.70E-05 0.00E+00 
1022 Jack2 13,781 0.319 0.213 0.01649 0.00284 0.443 0.358 0.04622 0.02047 7.30E-04 8.30E-05 
1023 Kaylee 14,102 0.223 0.963 0.00923 0.27557 0.32 0.958 0.02696 0.3417 3.80E-04 9.90E-04 
1024 Lilly 14,036 0.227 0.958 0.00963 0.57413 0.326 0.955 0.02811 0.62244 4.10E-04 3.20E-02 
1025 Lola 14,006 0.054 0.772 0.00047 0.14896 0.082 0.852 0.00146 0.32036 3.90E-06 1.70E-02 
1026 Lucy 13,984 0.148 0.028 0.00358 0 0.218 0.112 0.01074 0.00167 8.00E-05 0.00E+00 
1027 Luna 13,931 0.192 0.595 0.00644 0.00315 0.279 0.67 0.01903 0.02794 2.00E-04 0.00E+00 
1028 Maxine 13,941 0.285 0.206 0.01572 0.0288 0.402 0.309 0.04481 0.06574 840E-04 3.50E-03 
1030 Peso 14,024 0.221 0.987 0.00757 0.0691 0.317 0.966 0.02217 0.13451 1.80E-04 3.00E-03 
1031 Ramy 13,848 0.348 0.069 0.02199 0.00292 0.479 0.141 0.06082 0.00636 1.20E-03 1.70E-04 
1033 Rex 13,870 0.273 0.766 0.01398 0.18317 0.386 0.806 0.04007 0.29694 6.80E-04 1.60E-02 
1034 Rosie 14,113 0.379 0.323 0.02816 0.00114 0.516 0.573 0.07671 0.09173 1.80E-03 4.00E-04 
1037 Sadie 13,990 0.206 0.728 0.00769 0.08899 0.298 0.864 0.02262 0.33852 2.80E-04 9.60E-03 
1039 Scotch 13,846 0.176 0.044 0.00497 0.00051 0.256 0.114 0.01478 0.00312 1.10E-04 0.00E+00 
1042 Zandy 14,058 0.192 0.94 0.00671 0.34548 0.28 0.936 0.01983 0.4682 2.30E-04 3.10E-02 
1018 Gus 13,854 0.214 0.741 0.00628 0.01074 0.308 0.731 0.01848 0.04687 
1032 Reilly 13,990 0.246 0.102 0.00454 0 0.351 0.194 0.01315 0.00218 
1035 Ruby 14,067 0.279 0.372 0.00914 0.00147 0.394 0477 0.02614 0.00416 
1036 Rudy 13,783 0.269 0.349 0.00767 0.01672 0.38 0.458 0.02204 0.06807 
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